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FR99/01513 

NUCLEIC SEQUENCE AND DEDUCED PROTEIN SEQUENCE FAMILY 
WITH HUMAN ENDOGENOUS RETROVIRAL MOTIFS, AND THEIR USES 

The present invention relates to a novel 
5 nucleic sequence and deduced protein sequence family 
with complete or partial human endogenous retroviral 
motifs . 

The invention also relates to the detection 
and/or use of said nucleic sequences and of said 

10 corresponding protein sequences in the context of 
diagnostic, prophylactic and therapeutic applications, 
in particular for neuropathological conditions with an 
autoimmune component such as multiple sclerosis. 

The invention also relates to the production of 

15 antisense double-stranded and single-stranded nucleic 
probes, of ribozymes, capable of modulating viral 
replication (T.R. Cech, Science, 1987, 236, 1532-1539; 
R.H. Symons, Trends Biochem. Sci., 1989, 14, 445-450) 
of the corresponding recombinant molecules, and 

20 associated antibodies. 

Retroviruses are viruses which replicate solely 
by using the opposite route to the conventional 
processing of genetic information. This process, called 
reverse transcription, is mediated by an RNA dependent 

25 DNA polymerase or reverse transcriptase, encoded by the 
pol gene. Retroviruses also encode at least two 
additional genes. The gag gene encodes the proteins of 
the skeleton, matrix, nucleocapsid and capsid. The env 
gene encodes the envelope glycoproteins. Retroviral 

30 transcription is regulated by promoter regions or 
"enhancers" situated in highly repeated regions or LTR 
{Long Terminal Repeat) and which are present at both 
ends of the retroviral genome. 

During the infection of a cell, polymerase 

35 makes a DNA copy of the RNA genome; this copy may then 
integrate into the human genome. Retroviruses do not 
kill the cells which they infect, but on the contrary 
often enhance their rate of growth. Retroviruses can 
infect germ cells or embryos at an early stage; they 



can, under these conditions, integrate the germ line 
and be transmitted by vertical Mendelian transmission, 
which constitutes the closest relationship between a 
host and its parasite. These endogenous viruses can 
degenerate during generations of the host organism and 
lose their initial properties. However, some of them 
may conserve all or part of their properties or of the 
properties of their constituent motifs, or acquire 
novel functional properties having an advantage for the 
host organism, which would explain the preservation of 
their sequence. 

The existence of endogenous motifs having long 
open reading frames and/or subjected to a strong 
selection pressure can therefore be an indication of a 
preserved or acquired biological function, which may 
correspond to a benefit for the host organism. These 
retroviral sequences can also undergo, over the 
generations, discrete modifications which will be able 
to trigger some of their potentials and generate or 
promote pathological processes. It has recently 
appeared necessary to carry out a review and to 
identify these sequences so as to be able to evaluate 
their functional impact. 

Human endogenous retroviral sequences or HERVs 
represent a substantial part of the human genome. These 
retroviral regions exist in several forms: 

- complete endogenous retroviral structures 
combining gag, pol and env motifs, flanked by repeat 
nucleic sequences which exhibit a significant analogy 
with the LTR-gag-pol-env-LTR structure of infectious 
retroviruses, 

- truncated retroviral sequences; for example 
the retrotransposons lack their env domain and the 
retroposons do not possess the env and LTR regions. 

Up until now, the study of these regions of the 
genome has been neglected in humans for essentially two 
reasons : 
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- the existence of insertions/deletions which 
can shift the reading frame and of mutations which 
modify the sequence. These modifications cause 
impairment of the structure and consequently of the 

5 biological function of these motifs, 

- the absence of confirmed associations with 
human pathological conditions. 

The recent knowledge of fragments which are 
significantly representative of the human genome and an 

10 orientation of research studies toward a study of 
structure/function of endogenous retroviral motifs have 
made it possible to specify the importance of these 
regions. The involvement of truncated or complete 
endogenous sequences in pathological conditions in 

15 animals is documented; for example their association 
with tumor processes has been clearly demonstrated 
(S.K. Chattopadhyay et al., 1982, Nature, 295, 25-31). 
Research aimed at specifying the association or the 
influence of HERVs in human pathological conditions is 

20 now therefore justified. 

A classification of the HERV elements has been 
proposed (Tonjes R.R. et al., AIDS & Hum. Retroviral., 
1996, 13, p261-p267; A.M. Krieg et al., FASEB J., 1992, 
6, 2537-2544). It is based on a homology of these 

25 sequences with retroviruses isolated in animals, with 
the aid of heterologous retroviral probes. Indeed, in 
general, the HERVs exhibit relatively little homology 
with known human infectious retroviruses. 

The class I families exhibit a sequence 

30 homology with the type C mammalian retroviruses; there 
may be mentioned in particular the ERI superfamily, 
close to the MuLV virus (murine leukemia virus) and to 
the BaEV virus {baboon endogenous virus) . 

The class II families exhibit a sequence 

35 homology with the type B mammalian retroviruses such as 
MMTV (mouse mammary tumor virus) or the type D 
retroviruses such as SRV (squirrel monkey retrovirus) . 



Other families have also ( been described; among 
these, there may be mentioned HERVs which exceptionally 
exhibit partial homology with HTLV-1 (RTVL-H) or 
primate viruses; HRES-1, for example, exhibits sequence 
homology with HTLVs * 

Programmes for very large sequencing of the 
human genome now make it possible to have available a 
significant number of novel retroviral sequences. The 
use of data processing software packages makes it 
possible to identify and analyse these genes. In this 
context, a systematic search relating to the entire 
information available to date has been initiated in 
order to identify novel human endogenous retroviral 
sequences as a function of certain analytical criteria: 

- presence ' of long open reading frames 
conserved during evolution of the host organism and 
which may suggest a biological function, 

- analogy with sequences already characterized 
outside or inside the retrovirus domain, 

- location in regions of susceptibility for 
certain pathological conditions or close to essential 
genes, for example in the cancer domain, regulation of 
the immune system or in certain neuropathological 
conditions . 

The work carried out by the inventors on 
sequence databases allowed them to identify a set of 
endogenous retroviral sequences or motifs whose normal 
or pathological expression can promote or disrupt a 
protective effect in relation to pathological 
processes, or play a role in the onset or worsening of 
pathological conditions . 

The subject of the present invention is a 
purified nucleic acid fragment, characterized in that 
it comprises all or part of a sequence encoding a human 
endogenous retroviral sequence, which has at least env- 
type retroviral motifs, corresponding to the sequence 
SEQ ID NO: 1 or to a sequence exhibiting a level of 
homology with said sequence SEQ ID NO: 1 greater than 
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or equal to 80% on more than 190 nucleotides or greater 
than or equal to 70% on more than 600 nucleotides for 
the env-type domains. 

The expression homologous sequence is 
5 understood to mean both a sequence which exhibits 
complete or partial identity with the abovementioned 
sequence SEQ ID NO: 1 and a sequence which exhibits 
partial similarity with said sequence SEQ ID NO: 1. 

According to an advantageous embodiment of said 

10 fragment, it has retroviral motifs corresponding to an 
env domain and corresponding to the sequence 
SEQ ID NO: 1 and retroviral motifs corresponding to a 
gag domain and corresponding to the sequence 
SEQ ID NO: 2 or to a sequence exhibiting a level of 

15 homology greater than or equal to 80% on more than 190 
nucleotides or greater than or equal to 70% on more 
than 600 nucleotides for the env-type domains and a 
level of homology greater than or equal to 90% on more 
than 700 nucleotides or greater than or equal to 70% on 

20 more than 1 200 nucleotides for the gag-type domains, 
said motifs having no insertion or deletion of more 
than 200 nucleotides. 

Said fragments constitute a novel family of 
human endogenous retroviral sequences (HERV-7q family) 

25 which exhibits sequence homology with the MSRV retro- 
viruses, as described in International Application 
WO 97/06260; said fragments according to the present 
invention have: 

- two repeat nucleotide motifs of 711 bp 

30 (Figure 3), having characteristic signals identified in 
LTRs (Long Terminal Repeats) : transcription promoters 
of the TATAA or CCAAT box type. These repeat domains 
delimit three deduced motifs of the gag , pol and env 
type ( Figure 2 ) , 

35 - an env-type motif (positions 6965 nt 

9550 nt on the sequence SEQ ID NO: 3) which contains a 
long open reading frame of 1 620 nucleotides (positions 
7874-9493 of the sequence ID NO: 3) encoding a protein 
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having an unpublished sequence of 540 amino acids 
(Figure 4) and underlined fragment in SEQ ID NO: 27. 
There is present inside the transmembrane domain of 
this env domain a peptide motif of the CKS-25/CKS-17 
5 type (Figure 5), recognized as having immunosuppressive 
functions on the host lymphocytic cells (M. Mitani et 
al., 1987, Proc. Natl. Acad. Sci. USA, 84, 237-240). A 
zinc finger type domain HX3_ 4 HX22-33CX 2 C (Kulkolski et 
al., 1992, Mol. Cell. Biol., 12, 2331-2338), which is 

10 present in integrase-type domains is identified in 
another reading frame. This particular env domain 
signatures the characteristic of novel endogenous 
retroviral motifs , 

the motif (positions 3065 nt - 4390 nt on the 

15 sequence SEQ ID NO: 3) of the gag type encoding protein 
motifs according to Figure 6 (SEQ ID NO: 51) (positions 
3118-4198 of SEQ ID NO: 3) was identified by virtue of 
analogies with known gag domains. The region of major 
homology QX 3 EX 7 R is for example present (Benit et al . , 

20 1997, J. Virol., 71, 5652-5657). The nucleic acid 
binding motif 0X20X3.411X40, situated at the C-terminal 
position, is identified in another reading frame (Covey 
et al., 1986, Nucleic Acids Res., 14, 623-633). 
Upstream of the gag domain, a motif of 182 nucleotides 

25 is detected which is repeated twice (Figure 1), 

- the pol domain exhibits the conventional 
consensus of a retrovirus pol region at the level of 
the protease, reverse transcriptase and RNAse H 
domains. A motif close to the consensus LLDTGA is found 

30 in pol (Weber et al., 1988, Science, 243, 928-931). The 
motifs D and AF, LPQ and SP, and YVDD (Xiong and 
Eickbush, 1990, EMBO J., 9, 3353-3362) are respectively 
found in the 3rd, 4th and 5th homology boxes. The 
motifs YTDGSS and TDS are present in the RNAse H 

35 region, 

- the gag and pol regions could be considered 
as being joined with a passage from the gag region to 
the pol region by a reading frame shift. 
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The present invention includes the sequences 
belonging to the HERV-7q family as defined above 
(presence of the SEQ ID NO: 1 sequence or of a 
homologous sequence or presence of both the sequences 
5 SEQ ID NO: 1 and SEQ ID NO: 2) and in particular the 
sequences SEQ ID NO: 3-21; it also includes the 
complementary nucleic sequences and the reverse 
sequences complementary to the preceding sequences as 
well as fragments derived from the coding regions of 
10 the preceding sequences corresponding to a shifting 
frame greater than or equal to 14 nucleotides or their 
complementary sequences (SEQ ID NO: 30-50). 

These various fragments may be advantageously 
used as primers or as probes ; they hybridize 
15 specifically to a sequence of the HERV-7q family. 

Among these fragments, the following fragments 
may be preferably mentioned: 

- a fragment of 182 nucleotides, repeated 
twice, situated upstream of the gag domain at positions 

20 2502-2611/2613-2865 of SEQ ID NO: 3: 

Primers and probes specific for the gag region 

- a sense primer GIF located in the region 
upstream of the gag domain of HERV-lq: 
5' GGACCATAGAGGACACTCCAGGACTA3' (SEQ ID NO: 30) V 

25 - an antisense primer G1R located in the 

terminal 3' region of the 9^9 domain: 

5 f CCTCAGTCCTGCTGCTGGATCATCT3 ' (SEQ ID NO: 31) 

- the fragment of 1505 nt amplified by the pair 
G1F-G1R is used in order to generate the probes capable 

30 of hybridizing the various PCR amplification products: 
a nested sense primer G2F: (SEQ ID NO: 32) 
5' CCTCCAAGCAGTGGGAGGAAGAGAATT3' 

a nested antisense primer G2R: (SEQ ID NO: 33) 
5 ' CCTTCCCTGTGTTATTGTGGACATCATT3 ' 
35 - a nested sense primer G4F: (SEQ ID NO: 34) 

5 ' GGAAGAAGTCTATGAATTATTCAATGATGT3 ' 
- a nested sense primer G3F: (SEQ ID NO: 35) 
5' GGGACACAGAATCAGAACATGGAGATT3 ' 
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- a nested antisense primer G4R: (SEQ ID NO: 36) 
5' GCCTTCAGAAGAGTCAGGTGACAGAGA3' 

- a nested antisense primer G5R: (SEQ ID NO: 37) 
5' GAGCCTCCAAAGTCCACTTGCCTGA3' 

5 Primers and probes specific for the enxr region 

- a sens primer E1F: (SEQ ID NO: 38) 
5 ' GATTTCAGTATCTACTAGTCTGGGTAGAT3 ' 

- an antisense primer E1R: (SEQ ID NO: 39) 
5' CTAGGAAATCCAGCTAGTCCTGTCTCA3' 

10 - the fragment of 2529 nt, amplified by the pair 

of primers E1F-E1R, is used to generate the probes 
capable of hybridizing the various PCR amplification 
products : 

- a sense primer E2F: (SEQ ID NO: 40) 
15 5 ' CCAAGACAGCCAACTTAGTTGCAGACAT 3 ' 

- an antisense primer E2R: (SEQ ID NO: 41) 
5' GGACGCTGCATTCTCCATAGAAACTCTT3 ' 

- a sense primer E3F: (SEQ ID NO: 42) 
5 ' GCAATACTACATACACAACCAACTCCCAA3 ' 

20 - an antisense primer E3R: (SEQ ID NO: 43) 

5' GGGGGAGGCATATCCAACAGTTAGTA3' 

- a sense primer E4F: (SEQ ID NO: 44) 
5 ' CCATCTACACTGAACAAGATTTATACACTT3 ' 

- an antisense primer E4R: (SEQ ID NO: 45) 

2 5 5 ' AATGCCAGTACCTAGTGCACCTAGCACT3 ' 

- a sense primer E5F: (SEQ ID NO: 46) 
5 ' CGAATACAACGTAGAGCAGAGGAGCTTCGAA3 ' 

- a sense primer E6F: (SEQ ID NO: 47) 
5 ' AGCCCAAGATGCAGTCCAAGACTAAGAT3 ' 

30 - a primer E5R: (SEQ ID NO: 48) 

5 ' GCGTAGTAGAGGTTGTGCAGCTGAGAT3 ' 

- a primer ExF: (SEQ ID NO: 49) 
C C C T T AC C AAG AG T T T C TAT G G AG AAT 

- a primer ExR: (SEQ ID NO: 50) 

3 5 ACCGCTCTAACTGCTTCCTGCTGAATT 

All the oligonucleotides are designed to be able 
to generate a sense primer and an antisense primer by a 
shift in the sequence of the reference primer of 1 to 7 
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nucleotides toward the 5' side or toward the 3' side; 
the modification of the sequence may cause a 
modification of the size of the primer of 1 to 7 
nucleotides depending on the cases. The primers chosen 
5 may be optimized depending on the cases by shortening 
or extension affecting 1 to 9 nucleotides. 

Preferably, the hybridization, cloning, 
subcloning, production, preparation and analysis of the 
nucleic acids, peptides and antibodies, the sequencing 
10 of the nucleic acids and peptides, the in situ 
hybridization and the immunohistochemistry are carried 
out under the conditions described in the following 
books : 

- Current Protocols in Molecular Biology, Eds. 
15 F.M. Ausubel, R. Brent & R.E. Kingston et al. Green 

Publishing associates and Wiley Interscience . 

- Molecular Cloning: a laboratory manual. Eds. 
J. Sambrook, E.F. Fritsch & T. Maniatis, Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor. 

20 - The Practical Approach series. Eds. 

D. Rickwood & B.D. Ames, IRL Press and Oxford 
University Press. In particular antibodies I & II; DNA 
cloning I, II, III; Nucleic acid and protein sequence 
analysis; Nucleic acid hybridization; Nucleic acid 

25 sequencing; Oligonucleotide synthesis; Protein 

purification applications; Protein purification 
methods; Protein sequencing; Transcription and 
translation; Gels electrophoresis of nucleic acids; 
Gels electrophoresis of proteins; Genome analysis; HPLC 

30 of macromolecules ; Human genetic diseases; 

Microcomputing in biology; Molecular neurobiology; 
Mutagenicity testing; Essential molecular biology I & 
II . 

- Proteome research: New frontiers in 
35 functional genomics, Eds. M.R. Wilkins et al., 

Springer . 

The human endogenous retroviral sequence 
(SEQ ID NO: 3) situated on the long arm of chromosome 7 
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corresponds to the HERV-7q sequence; it has 10.5 kb 
(Figs. 1 and 2) and satisfies the criteria defined 
above . 

The search for domains exhibiting total or 
5 partial similarity with the gag and ertv regions of 
HERV-7q resulted in the identification of novel 
endogenous retroviral sequences. These sequences may 
have the structure of a complete endogenous retrovirus 
such as the endogenous retroviral sequence situated 

10 close to the gene for the alpha and delta subunits of 
the T cell receptor, and consequently called HERV-TcR; 
by way of example, Figure 7 shows the comparison of the 
nucleic alignments of the respective gag domains of 
HERV-7q and HERV-TcR (sequence HG12, SEQ ID NO: 18). 

15 Partial retroviral structures also exist. These 
retroviral domains, similar to HERV-7q, are identified 
in independent nucleic sequences as shown by their 
chromosomal location. Nucleic motifs (called here HEx 
or HGx, and analogous to env or gag type domains, 

20 respectively) resembling the env or gag domains of 
HERV-7q were found, with the aid of the abovementioned 
databases : 

- HE2: chromosome 17 (SEQ ID NO: 4), 

- HE3 and HG3 : chromosome 6 (SEQ ID NO: 5 and 6), 
25 - HE4: chromosome X (SEQ ID NO: 7), 

- HE5: chromosome X q22 (SEQ ID NO: 8), 

HE 6 and HG6: chromosome 1 q23.3-q24.3 (SEQ ID 
NO: 9 and 10) , 

- HE7: chromosome 7 pl5 (SEQ ID NO: 11), 

30 - HE8 and HG8 : chromosome 19 (SEQ ID NO: 12 and 

13) , 

- HE9: chromosome X (SEQ ID NO: 14), 

- HE10: chromosome X ql3. 1-21.1 (SEQ ID NO: 15), 
HE11 and HG11: chromosome 7 q21-22 (SEQ ID NO: 

35 16 and 17) , 

- HE12 and HG12, in HERV-TcR: chromosome 14 qll.2 
(SEQ ID NO: 18 and 19) , 
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The alignments of the env (Fig. 8) and gag 
(Fig. 9) domains explain the levels of homology 
observed between the sequences described above and the 
homologous sequences in HERV-7q. The analogies can 
5 extend to the flanking retroviral motifs. 

Analysis of the sequence tags available in 
databases shows that transcripts belonging to some 
members of this family, in particular HERV-7q, are 
essentially expressed in tissues of foetal or placental 
10 origin. 

Polypeptide sequences generated by these 
transcripts can therefore be potentially produced and 
biological functions or activities can be envisaged, by 
analogy with biologically active polypeptides of viral 

15 or retroviral origin; for example, the peptide motifs 
of the CKS-17 type (Fig. 5) or CKS-25 type (Huang S.S. 
and Huang J.S., J. Biol. Chem. 1998, 273, 4815-4818) 
which have immuno-modulatory functions on the 
lymphocytic host cells. The differences in sequence 

20 which are observed and possible normal or pathological 
modifications are in particular responsible for 
modulation of the function. 

HERV-7q represents the paradigm of the novel 
family of human endogenous retroviral sequences or of 

25 endogenous retroviral motifs. 

HERV-7q and some of the endogenous retroviral 
sequences belonging to its family have a pol-type 
domain analogous to pol-type retroviral sequences such 
as for example the pol region identified in the MSRV 

30 retrovirus associated with multiple sclerosis and 
described by H. Perron et al. (1997, Proc. Natl. Acad. 
Sci. USA, 94, 7583-7588; International Application PCT 
WO 97/06260) . 

However, the sequences according to the present 

35 invention are distinguishable from the infectious 
exogenous retroviral sequences analogous to MSRV 
previously described in that the gag and env sequences 
according to the invention are significantly different 



according to the criteria defined above and as a 
function of certain specific characteristics, for 
example the long open reading frame of the env domain 
of HERV-7q; they would be able to allow the signaturing 
of a pathological condition when they have insertions, 
deletions, reading frame shifts or mutations. 

Indeed, the differences observed between the 
human sequences of the HERV-7q type, which are isolated 
from individuals reputed to be normal, and the 
sequences derived from some samples of pathological 
origin are not randomly distributed. Comparisons 
carried out between the gag region obtained from 
infectious retroviral particles (EMBL accession No.: 
A60168, A60200, A60201, A60171 and the like) and the 
corresponding gag sequence of HERV-7q (Fig. 9), make it 
possible to observe that the mutations preferably 
affect non-sense codons . For example, two non-sense 
codons in HERV-7q are replaced by an arginine codon in 
A60200, which makes it possible to obtain a deduced 
sequence of 109 amino acids for HERV-7q and of 166 
amino acids for A60200. The base changes consequently 
make it possible to extend the reading frame and to 
potentially encode larger sized polypeptide structures 
(Figure 10) . 

Likewise, an env-type sequence obtained from 
infectious retroviral particles exhibits a significant 
analogy with the env domain of HERV-7q (Figure 11). 
These marked analogies between exogenous and endogenous 
retroviral sequences could be responsible for the 
triggering or worsening of certain pathological 
processes, in particular certain autoimmune diseases 
such as multiple sclerosis. In this regard, it is 
possible to note that certain endogenous retroviral 
sequences described in the invention are situated close 
to or in regions reputed to exhibit susceptibility for 
multiple sclerosis: for example HERV-7q and the 7q21-22 
region of chromosome 7, likewise for HE12 and HG12 in 
HERV-TcR and the region of the gene encoding the alpha 



- 13 - 

and delta chains of the T cell receptor, HE2 and 
chromosome 17, or HE3, and HG3 and chromosome 6 

No significant homology is observed with 
endogenous retroviral sequences already described; on 
5 the other hand, a limited homology may be noted, and in 
any case said homology is less than the criteria 
defined according to the invention between the env 
domains of the sequence HERV-7q (SEQ ID NO: 1) and the 
sequence HERV-9 (Figure 12). Figure 13 shows extensive 
10 homologies between the sequence HERV-7q with an 
exogenous retroviral sequence (accession No. EMBL: 
A60170) . 

The human endogenous retroviral sequences 
belonging to the HERV-7q family can protect against 

15 attacks linked to the environment or can be beneficial 
for the individual. This beneficial effect could be one 
of the possible reasons for the selection pressure 
exerted on some of these sequences and the potentially 
functional character of the deduced protein structures 

20 identified: for example the long open reading frame 
capable of encoding a novel protein and corresponding 
to the env domain of HERV-7q. 

The human endogenous retroviral sequences 
belonging to the HERV-7q family could be associated, 

25 for example, with pathological conditions related to 
processes linked to cancer, to neuropathological 
conditions with an autoimmune component or to any other 
pathological process in association or otherwise with 
endogenous or exogenous viruses or retroviruses. Their 

30 action could be related to the outbreak, the worsening, 
the modification of the time of appearance or the 
protection against the disease. 

In the context of application to autoimmune 
pathological conditions (such as for example lupus, 

35 Sjogren's syndrome, rheumatoid arthritis, multiple 
sclerosis and the like) , significant analogies may be 
detected between the endogenous retroviral motifs 
identified and motifs found in retroviral structures 
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characterized in patients with autoimmune pathological 
conditions such as multiple sclerosis'; for example, 
fragments of gag domain (recently available in 
databases) obtained from infectious retroviral 
5 particles or the complete sequence of the pol domain 
corresponding to the MSRV virus associated with 
multiple sclerosis. These retroviral motifs possess 
significant analogies with homologous endogenous 
sequences of the HERV-7q type, which makes it possible 

10 to envisage direct or indirect association with 
pathological processes, including multiple sclerosis, 
in association or otherwise with MSRV* 

The presence of some sequences or motifs can be 
observed in multiple sclerosis susceptibility regions : 

15 for example, the sequences HE 11 and HG 11 around the 
region 7q 21-22 or moreover HE 4, HE 5, HE 6, HE 9, 
HE 10 or HG 10 on the X chromosom are located in or 
nearness chromosomic regions regularly associated with 
multiples sclerosis susceptibility genes. These 

20 sequences should provide means for the localisation or 
identification of predisposition genes. 

The importance of these sequences goes beyond 
the context of autoimmune diseases. Apart from the 
general importance of retroviral motifs in the 

25 triggering or worsening of a tumor process, which is 
well established in particular in murine models (H. Fan 
in The retroviridiae, 1994, ed. J. A. Levy, Plenum, New 
York, p. 313-353), these sequences could be present 
close to or inside important genes and could alter the 

30 expression thereof: for example HERV-TcR and the genes 
for the alpha and delta subunits of the receptor for 
the T cells involved in disruptions of the immune 
system. The subject of the invention is also 
transcripts generated from the abovementioned sequences 

35 as well as those optionally exhibiting modifications in 
the reference sequences described in the invention when 
they are expressed in certain patients. 



Indeed, the systems for regulating the the 
expression of the retroviral proteins of HERV-7q, which 
are present in the LTR type motifs, could influence the 
expression of genes situated in the close or distant 
chromosomal vicinity and could induce disruptions of an 
immunological and/or neurological character. For 
example, the endogenous retroviral sequence HERV-TcR 
exists in the immediate vicinity of the genes for the 
alpha and delta subunits of the T cell receptor 
previously described. The LTR-type motifs could also 
encode superantigens (Acha-Orbea and Palmer, 1991, 
Immunol. Today, 12, 356-361). In general, retroviral 
proteins of the HERV-7q or related type, or their 
truncated or partial forms could be involved in 
cytotoxicity or superantigenicity phenomena, such as 
for example those derived from the long open reading 
frame identified in the env domain (Figure 4). 

In this regard, it is possible to note that 
retroviral motifs derived from defective regions are 
capable of having biological functions; for example, 
the envelope protein pl5E, derived from defective 
retroviral motifs, possesses an anti-inflammatory and 
immunosuppressive activity (Snyderman and Ciancolo, 
1984, Immunol. Today, 5, 240-244). 

These structures are probably capable of 
causing breaks or of amplifying deregulations in the 
immune defense processes. Some of the motifs of the 
gag, env and LTR-type domains may be associated with a 
particular function or may contribute to the normal or 
pathological function of the flanking domains. 
Recombinations with an element of exogenous, retroviral 
origin or otherwise can give rise to the production of 
nucleic or protein motifs which could either protect or 
trigger or promote or worsen a pathological condition. 
Likewise, a retroviral structure containing endogenous 
retroviral elements according to the invention would be 
capable of causing a pathological process after passing 
through an exogenous transient cycle followed by 
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reintegration into a sensitive or critical region of 
the human genome. Likewise, the combination of 

motifs belonging to the HERV-7q family, or of elements 
induced by motifs belonging to the HERV-7q family, with 
5 motifs of exogenous origin or induced exogenously would 
be capable of triggering or worsening a pathological 
process or on the contrary of promoting protection or 
partial remission or a complete and permanent cure. 

The detection made possible of the HERV-7q type 

10 domains suggests possible applications at the 
prophylactic, prognostic and diagnostic level; for 
example, immunological approaches or gene 

amplification, which make it possible to compare normal 
individuals serving as reference with patients, would 

15 be capable of promoting screening, of improving early 
detection of the outbreak of the disease and/or of 
monitoring the progression of a pathological condition 
in patients which may exhibit a susceptibility or in 
whom there has been an outbreak of the disease or in 

20 individuals considered to be normal, based on current 
clinical criteria . 

The specific nucleic and immunological probes, 
as defined, in the present invention are capable of 
promoting the identification and detection of motifs 

25 which are abnormally expressed in the context of 
pathological conditions associated with cancer, or of 
neuropathological conditions, in particular autoimmune 
pathological conditions, at the forefront of which is 
multiple sclerosis . 

30 Therapeutic strategies may be envisaged by 

using some of the nucleic sequences contained in 
HERV-7q and the sequences of the same family or deduced 
polypeptide structures or by the use of peptides or 
proteins, or of specific antibodies. The 

35 subject of the present invention is also hybrid nucleic 
sequences, characterized in that they comprise 
sequences or motifs belonging to the HERV-7q family, or 
of elements induced by motifs belonging to the HERV-7q 
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family, with motifs of exogenous origin or induced 
exogenously (exogenous retroviral sequences); such 
hybrid sequences are probably capable of triggering or 
worsening a pathological process or on the contrary of 
5 promoting protection or partial remission or a complete 
and permanent cure. 

The subject of the present invention is also a 
diagnostic reagent for the differential detection of 
complete or partial human endogenous nucleic sequences, 

10 having retroviral motifs, selected from the sequences 
SEQ ID NO: 1 and/or SEQ ID NO: 2, characterized in that 
it is selected from the group consisting of the 
sequences SEQ ID NO: 1-50, the complementary nucleic 
sequences and the reverse sequences complementary to 

15 the preceding sequences, of nucleotide fragments 
capable of defining or of identifying the sequences SEQ 
ID NO: 1 and/or SEQ ID NO: 2 and any flanking sequence 
or any sequence overlapping them as well as of 
fragments derived from the coding regions of the 

20 sequences SEQ ID NO: 1-24 corresponding to a shifting 
frame greater than or equal to 14 nucleotides or their 
complementary sequences, optionally labeled with an 
appropriate marker . 

The sequences of the nucleic, ribonucleic and 

25 oligonucleotide probes used will be chosen from the env 
and gag regions or their flanking regions; for example 
the oligonucleotide primers for HERV-7q will be chosen 
from the regions situated between nucleotides 3065 and 
4390, nucleotides 6965 and 9550 as well as from any 

30 adjacent sequence (upstream or downstream) capable of 
allowing specific amplification (Figure 1) . 

Among the appropriate markers, there may be 
mentioned radioactive isotopes, enzymes, f luorochromes, 
chemical markers (biotin) , haptens (digoxygenin) and 

35 antibodies or appropriate base analogues. 
Preferably: 
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- said reagent is selected from the sequences 
SEQ ID NO: 30-50 and is capable of being used as a 
primer, 

- said reagent is selected from the following 
5 sequences : 

a fragment of 1505 nt amplified by the 
pair of primers SEQ ID NO: 30 and SEQ ID NO: 31 
(primers GIF and G1R) , 

a fragment of 2529 nt amplified by the 
10 pair of primers SEQ ID NO: 38 and SEQ ID NO: 39 
(primers E1F and ElR) 

and is capable of being used as a probe. 
The subject of the present invention is also a 
method for the rapid and differential detection of the 
15 endogenous retroviral nucleic sequences of the env or 
env and gag type, their normal or pathological 
variants, by hybridization and/or gene amplification, 
carried out using a biological sample, which method is 
characterized in that it comprises: 
20 (a) a step in which a biological sample to 

be analysed is brought into contact with at least one 
probe as defined above, and 

(b) a step in which the product (s) resulting 

from the nucleotide sequence-probe interaction is 
25 detected by any appropriate means. 

In accordance with said method, it may 
comprise : 

prior to step (a) : 
. a step of preparing the relevant biological 
30 tissue or fluid, 

. a step of extracting the nucleic acid to be 
detected, and 

. at least one gene amplification cycle, and 
subsequent to step (b) : 
35 . a step of comparing the nucleic sequences 

obtained in said biological sample with the human 
endogenous retroviral sequences according to the 
invention by any appropriate means and in particular by 
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sequencing, Southern blotting, restriction cleavage, 
SSCP or any other method which makes it possible to 
identify an insertion or a deletion or a single 
mutation between the various sequences compared, 
5 In accordance with the invention, the human 

endogenous retroviral sequences according to the 
invention are thus compared with the nucleic sequences 
present in the biological sample to be analysed and 
allow the detection of homologous sequences from 

10 patients suffering from pathological conditions likely 
to involve a modification of their genome. 

Advantageously, said gene comparisons are 
carried out using genomic DNA obtained from control 
individuals and from patients. 

15 A conventional gene amplification by PCR will 

be carried out with the aid of 5' -sense and 3'- 
antisense primers delimiting or comprising the zone to 
be studied (env zone or gag zone) ♦ 

Also advantageously, the sequences of the 

20 nucleic, ribonucleic and oligonucleotide probes used 
are chosen from the env and gag regions or their 
flanking regions; for example the oligonucleotides 
which are primers for HERV-7q will be chosen from the 
regions situated between nucleotides 3065 and 4390 and 

25 nucleotides 6965 and 9550, and from any adjacent 
sequence (upstream or downstream) capable of allowing 
specific amplification (Figure 1), as specified above. 
They are preferably selected from the group consisting 
of 

30 a fragment of 1505 nt amplified by the pair 

of primers SEQ ID NO: 30 and SEQ ID NO: 31 (primers GIF 
and G1R) , 

a fragment of 2529 nt amplified by the pair 
of primers SEQ ID NO: 38 and SEQ ID NO: 39 (primers E1F 
35 and E1R) . 

The gene amplification step is in particular 
carried out with the aid of one of the following gene 
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amplification techniques: amplification using 

Qp-replicase, PCR, LCR, ERA, CPR or SDA. 

The subject of the present invention is also a 
5 method of detecting transcripts as defined above, 
characterized in that it comprises: 

- collecting messenger RNAs obtained from 
control biological samples (biological tissues, cells 
or fluids) and from a similar sample collected from 

10 patients, and 

- the qualitative and/or quantitative analysis 
of said mRNAs by in situ hybridization, by dot-blot, 
Northern blotting, RNAse mapping or RT-PCR, with the 
aid of a diagnostic reagent as defined above. 

15 The subject of the present invention is also 

products of translation, characterized in that they are 
encoded by a nucleotide sequence as defined above. 

The subject of the present invention is also a 
peptide, characterized in that it is capable of being 

20 expressed with the aid of a nucleotide sequence 
selected from the group consisting of the sequences 
SEQ ID NO: 1-24, as defined above. 

Said peptide also includes the derived peptides 
or polypeptides comprising between 5 and 540 amino 

25 acids (SEQ ID NO: 25-29 and SEQ ID NO: 51 and their 
fragments of at least 5 amino acids) . Said peptides are 
translated from the above defined nucleotide sequences, 
according to the combination offered by usage of the 
different possible open reading frames. 

30 According to an advantageous embodiment of said 

peptides they are in particular selected from the 
sequences SEQ ID NO: 25-29 and SEQ ID NO: 51. 

According to another advantageous embodiment of 
said peptides, they are obtained from nucleic sequences 

35 as defined above, in which at least one non-sense codon 
may be replaced with a codon encoding one of the 
following amino acids: Phe (F), Leu (L) , Ser (S), Tyr 
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(Y), Cys (C), Trp (W) , Gin (Q) , Arg (R) , Lys (K) , Glu 
(E) or Gly (G) . 

The invention thus includes the deduced 
peptides or the deduced proteins corresponding to all 
5 or part of the nucleic sequences described in the 
invention, and optionally exhibiting modifications with 
the reference sequences described in the invention, 
when they are expressed in some patients. In 
particular, the invention includes the complete or 

10 partial sequences obtained according to the 3 sense 
reading frames and the 3 reverse and complementary 
reading frames (SEQ ID NO: 22-24) . 

Advantageously, the env protein of HERV-7q of 
the invention has : 

15 - N-glycosylation sites. The glycosylation of 

the envelope proteins of retroviruses appears to be 
directly associated with their functional properties, 
for example by influencing the number of determinants 
available in the T cells or by promoting recognition of 

20 antigens by the T cells. Glycosylation could play a 
role in the outbreak or the spread of a pathological 
condition with an autoimmune component. The 
glycosylations are necessary for maintaining the 
conformation of certain epitopes, in particular during 

25 the production of a recombinant envelope protein so as 
to develop a diagnostic reagent and to promote the 
efficacy of a possible vaccine. Positions 171, 210, 
216, 236, 244, 283 and 411. Expected number at random: 
3.2 

30 - prenylation sites . Prenylation is an 

essential mechanism for attachment to the cell membrane 
and for the targeting of certain proteins. This 
targeting process could be essential for the production 
of specific therapeutic agents capable of interfering 

35 with the production and regulation of the traffic of 
cellular complexes calling into play proteins involved 
in the cell interactions, growth and movement. 
Positions 188 and 290. Expected number at random: 1.8 
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- targeting sites in the endoplasmic reticulum . 
These sites could make it possible to bring about the 
targeting toward the endoplasmic reticulum in order to 
carry out the modifications necessary for promoting 
5 membrane crossing. Positions 353 and 431. Expected 
number at random: 0.2. Said peptides or proteins can 
advantageously show biological properties. 

The protein products generated by the 

10 endogenous retroviral sequences or produced in parallel 
may be advantageously characterized by micro-methods of 
analysis and quantification of peptides and proteins: 
HPLC/FPLC or equivalent, capillary electrophoresis or 
equivalent, microsequencing techniques (Edman method or 

15 equivalent, mass spectrometry and the like) . 

The subject of the invention is also antibodies 
directed against one or more of the peptides described 
above and their use either for carrying out a method, 
in particular a differential method, of in vitro 

20 detection of the presence of such a sequence in an 
individual . 

Said antibodies are advantageously polyclonal 
or monoclonal antibodies obtained by an immunological 
reaction from a human, mammalian or avian organism or 

25 other species toward the proteins, as defined above. 

The subject of the present invention is a 
method for the differential immunological screening of 
normal or pathological human endogenous retroviral 
sequences of the HERV-7q family, characterized in that 

30 it comprises bringing a biological sample into contact 
with an antibody according to the invention, the 
reading of the result being visualized by an 
appropriate means, in particular EIA, ELISA, RIA, 
fluorescence . 

35 By way of illustration, such an in vitro 

diagnostic method according to the invention comprises 
bringing a biological sample collected from a patient 
into contact with antibodies according to the invention 
and detecting with the aid of any appropriate method, 
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in particular with the aid of labeled anti- 
immunoglobulins, the immunological complexes formed 
between the proteins produced normally or 
pathologically and the antibodies. 
5 Monoclonal or polyclonal antibodies, produced 

from antigens corresponding to synthetic peptides, or 
recombinant polypeptide or proteins make it possible to 
monitor the expression of the peptides or proteins 
produced normally or pathologically. The analysis is 

10 preferably carried out by ELISA or equivalent, Western 
blotting or equivalent, or by immunohistochemistry . 

The peptides or proteins, derived from the 
endogenous retroviral sequences or whose expression is 
associated with the expression of these endogenous 

15 retroviral sequences, are tested for and identified. 

The subject of the present invention is also a 
method for the identification and detection of 
endogenous retroviral motifs which are abnormally 
expressed in the context of pathological conditions 

20 associated with cancer, or of neuropathological 
conditions, in particular autoimmune neuropathological 
conditions, at the forefront of which is multiple 
sclerosis, characterized in that it comprises the 
comparative analysis of the sequences extracted from a 

25 biological • sample and the sequences according to the 
invention . 

The subject of the present invention is also 
the application of the nucleic sequences or of the 
protein sequences according to the invention to the 

30 diagnosis of, to the prognosis of, to the evaluation of 
genetic susceptibility to, any induced, congenital or 
acquired human diseases, in particular those with 
cancerous, autoimmune and/or neurological components, 
such as multiple sclerosis, the associated syndromes 

35 and the neurodegenerative diseases in which all or part 
of the nucleic sequences according to the invention and 
related endogenous or exogenous forms are involved. 
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The subject of the present invention is also 
hybrid nucleic sequences, characterized in that they 
comprise nucleic sequences or motifs according to the 
invention, combined with sequences or motifs of 
5 endogenous origin or of exogenous origin or induced 
exogenously . 

The subject of the present invention is, in 
addition, a recombinant cloning or expression vector, 
characterized in that it comprises a nucleic sequence 
10 in accordance with the invention. 

In addition to the preceding arrangements, the 
invention also comprises other arrangements which will 
emerge from the description which follows, which refers 
to exemplary embodiments of the method which is the 
15 subject of the present invention as well as to the 
. appended drawings, in which: 

- Figure 1. Human nucleic sequence HERV-7q, 
whose analysis and treatment make it possible to 
characterize a novel endogenous retroviral structure. 

20 The repeat nucleic regions of type Rl and R2 and the 
gag , pol and env domains are underlined. The gag and 
env type domains are in italics. The region homologous 
to a noncoding 3' portion of Rab7 is double underlined. 

- Figure 2 . Map of the human endogenous retro- 
25 viral region HERV-7q. The upper part of the figure 

corresponds to an anonymous region of the human genome 
situated on the long arm of chromosome 7. The repeat 
domains (1), gag (2), pol (3) and env (4) of HERV-7q 
can be identified. The C-terminal env region (4.3) is 
30 prolonged upstream in the form of a long open reading 
frame (4.2). The domain 4.1 corresponds to the 
N-terminal region of the env domain. 

- Figure 3. Comparison of the repeat nucleic 
sequences situated at the boundaries of HERV-7q. The 5' 

35 (top) and 3' (bottom) repeat nucleic regions are 
compared and the identical bases are indicated by two 
dots . 
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- Figure 4 . Deduced sequence having an open 
reading frame in the env-type domain of HERV-7q 
according to the longest open reading frame rule. 

- Figure 5. Sequences around the CKS-17 domain 
5 identified in various deduced env domains of the 

HERV-7q family and comparison with reference CKS-17 
motifs . 

1) HE2 - 2) HERV-7q - 3) GenBank accession No.: 
M85205 - 4) HE7 - 5) HE9 - 6) CKS-17; the peptide motif 
10 endowed with immunomodulatory properties is underlined 
- 7) gp20 of retrovirus type D (SRV-Pc) . 

- Figure 6. Possible deduced sequence of the 
gag-type domain identified in HERV-7q established 
according to the longest open reading frame rule. X and 

15 / correspond to a non-sense codon and to a reading 
frame shift, respectively. The underlined sequence 
corresponds to the beginning of the pol domain. 

- Figure 7 . Comparison of the nucleic regions 
covering the gag region of HERV-7q (top) and HERV-TcR 

20 (bottom) and their flanking regions. The identical 
bases are specified by two dots. 

- Figure 8. Example of nucleic alignments of 
the env-type domain of HERV-7q with similar env-type 
domains present in human endogenous retroviral 

25 sequences of the same family. The non-sense codons are 
underlined: 1) HERV-7q - 2) HE2 03) HE3 - 04) HE4 . 

- Figure 9. Nucleic alignments between the gag 
domain of HERV-7q and the corresponding domains 
belonging to the same family. Comparison with fragments 

30 of gag domains isolated from infectious retroviral 
agents. Sequences of infectious retroviral origin: EMBL 
database accession No.: 1) A60168 - 2) A60201 - 3) 
A60200 - 4) A60171. Human endogenous retroviral 
sequences: 5) HERV-7q - 6) HG11 - 7) HG3 . The figures 

35 indicated in the endogenous sequences correspond to the 
number of nucleotides inserted in order to optimize the 
alignment with the gag-type sequences identified in 
retroviruses of infectious origin. 
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- Figure 10. Alignment of a deduced gag protein 
motif (top) belonging to an infectious retrovirus (EMBL 
accession No.: A60200) with the deduced gag protein 
motif (bottom) identified in HERV-7q. The non-sense 

5 codons are in bold and underlined. The identical amino 
acids are specified by 2 dashes. One dash indicates a 
deletion or a homologous amino acid. 

- Figure 11. Alignment of an env motif (top) 
belonging to an infectious retrovirus (EMBL accession 

10 No.: A60170) with the env motif (bottom) identified in 
HERV-7q. The homologous nucleotides are specified by 
two dots and the deletions by a dash. 

- Figure 12. Comparison between the env domain 
of HERV-7q (top) and the env domain of HERV-9 (bottom) . 

15 The 66% homology is limited to the 3' region of the env 
domain of HERV-7q and HERV-9, respectively between 
nucleotides 8976 nt and 9500 nt of HERV-7q and 
nucleotides 2898 nt and 3465 nt of HERV-9 (GenBank 
accession No.: X57147). Numerous insertions /deletions 

20 are also observed. 

- Figure 13. Comparison between the env-type 
domains from HERV-7q and from an infection exogenous 
retroviral sequence (EMBL accession No. A60170) . 

It should be clearly understood, however, that 
25 these examples are given solely by way of illustration 
of the subject of the invention and do not in any 
manner constitute a limitation thereto. 

EXAMPLE 1 : Detection, by gene amplification, of a 
nucleic sequence belonging to a domain of the gag or 
30 env type according to the invention, in a genomic DNA 
sample of human or mammalian origin 

The gene amplification is carried out using 
genomic DNA isolated from blood. An anticoagulant 
treatment is carried out with 1 ml of a citrate 
35 solution (per liter: 4.8 g of citric acid, 13.2 g of 
sodium citrate, 14.7 g of glucose) per 6 ml of fresh 
blood. After centrif ugation of 20 ml of blood for 
15 min at 130 000 g, the supernatant is removed and the 
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fraction enriched with white blood cells is transferred 
into a new tube and then recentrif uged under the same 
conditions as above. The fraction enriched with white 
blood cells is resuspended in an extraction buffer 
5 (10 nM Tris-HCl, 0.1 M EDTA, 20 \xg/ml of pancreatic 
RNAse treated so as to eliminate the DNAses, 0.5% SDS, 
pH 8.0), and then incubated for 1 hour at 37°C. 
Proteinase K is added at a final concentration of 
100 jig/ml. The suspension of lyzed cells is incubated 

10 at 50°C for 3 hours, with occasional stirring, and then 
treated with an equal volume of phenol equilibrated 
with 0.5 M Tris-HCl, pH 8.0. The emulsion formed is 
placed on a wheel for one hour and then centrifuged at 
5 000 g for 15 min at room temperature. The aqueous 

15 solution is treated and deproteinized by a triple 
phenol extraction in order to obtain a level of 
purification corresponding to an absorbance A260/A280 
final ratio greater than 1.75. The aqueous fraction is 
precipitated with 0.2 vol. of 10 M sodium acetate and 

20 2 vol. of ethanol. The DNA is then either collected 
with the tip of a bent Pasteur pipette, or centrifuged 
at 5 000 g for 5 min at room temperature. The DNA or 
the DNA pellet is washed twice with 70% ethanol and 
then taken up in 1 ml of TE, pH 8 . 0 so as to be eluted, 

25 with gentle stirring, for 12 to 24 hours. 

Oligonucleotides specific for the endogenous 
sequences described according to the invention are 
chosen in order to amplify the gag or env region of the 
endogenous retroviral regions described according to 

30 the invention. The genomic DNA studied is obtained from 
patients having pathological conditions such as 
multiple sclerosis and from individuals reputed to be 
healthy . 

The thermostable DNA polymerases used were 
35 chosen for their high accuracy during the amplification 
process, such as Vent DNA polymerase (Biolabs) and the 
like, and are used according to the conditions 
recommended by the supplier. 
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The amplification strategy uses, depending on 
the case, a simple PCR, or a nested or seminested PCR. 

Oligonucleotides used to amplify the gag 

region: 

5 - primer GIF, sense, located in the region 

upstream of the gag domain of HERV-lq . (SEQ ID NO: 30), 

- primer G1R, antisense, located in the 3' 
terminal region of the gag domain (SEQ ID NO: 31) . 

The fragment of 1505 nt amplified by the pair 
10 G1F-G1R; 1505 nt is used to generate the probes capable 
of hybridizing the various PCR amplification products, 
-primer G2F, sense nested (SEQ ID NO: 32), 
-primer G2R, antisense nested (SEQ ID NO: 33), 
-primer G4F, sense nested (SEQ ID NO: 34), 
15 -primer G3F, sense nested (SEQ ID NO: 35), 

- primer G4R, antisense nested (SEQ ID NO: 36) , 
-primer GSR, antisense nested (SEQ ID NO: 37). 
Oligonucleotides used to amplify the env region 

of HERV-7q: 

20 -primer E1F, sense (SEQ ID NO: 38), 

- primer E1R, antisense (SEQ ID NO: 39) . 

The fragment of 2529 nt amplified by the pair 
of primers E1F-E1R is used to generate the probes 
capable of hybridizing the various PCR amplification 
25 products. 
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-primer E5R (SEQ ID NO: 48), 
35 -primer ExF (SEQ ID NO: 49), 

- primer ExR (SEQ ID NO: 50) . 

The PCR is carried out using 50 to 200 ng of 
genomic DNA. The PCR conditions are those recommended 
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by the supplier. The amplification cycle conditions are 
carried out in 50 pi: denaturation of 94 °C for 1 min, 
hybridization of 70°C for 1 min, and extension at 72°C 
for 1 to 2 min, depending on the amplified fragments. 
5 After 35 cycles, a terminal reaction is carried out at 
72 °C for 10 min. Automated sequencing of the amplified 
samples is carried out with the aid of an Applied 
Biosystems type ABI 377 sequencer or another comparable 
model, according to the protocols provided by the 

10 manufacturer. 

In the case of a nested or seminested PCR, the 
same experimental conditions are used, the only 
difference being that the genomic DNA sequence is 
replaced with 5 to 10 pi of the amplification product 

15 derived from the first PCR. 

Two independent amplifications are carried out 
using the same sample. A control reaction is carried 
out by replacing the DNA sample with water in order to 
detect possible contaminants. 

2 0 EXAMPLE 2 : Detection, by gene amplification, of a 
nucleic sequence according to the invention in a 
biological sample of genomic DNA collected from 
patients having an existing candidate pathological 
condition or suspected of having this pathological 

2 5 condition 

The amplification protocol is the same as in 
Example 1, apart from the origin of the sample which is 
obtained from patients having a candidate pathological 
condition. A genomic DNA sample reputed to be normal is 

30 systematically integrated into the set of amplified 
pathological samples and then analyzed. 

The PCR products are separated on a 1.5% 
agarose gel and then transferred in the presence of 
0.4 N sodium hydroxide on a charged nylon membrane. 

35 Hybridization is carried out with a specific probe 
corresponding to the PCR fragments amplified either 
with the pair G1F-G1R or the pair E1F-E1R . The probe is 
labeled by incorporating dUTP-digoxygenin according to 
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the supplier's protocol (Boehringer Mannheim). The 
hybridization is carried out in a hybridization buffer 
(5XSSC, 50% formamide, 0.1% lauroylsarcosine, 

0.02% SDS, 2% blocking reagent Boehringer) overnight at 
5 42°C. The Southern is washed for twice 5 min at room 
temperature in a 2XSSC solution containing 0.1% SDS. 
Next, a high stringency wash is carried out twice for 
15 min at 55°C in a 0.1XSSC solution containing 
0.1% SDS. The hybridization is visualized according to 

10 the supplier's protocol (Boehringer Mannheim), in the 
presence of a chemiluminescent substrate for alkaline 
phosphatase, of the CSPD or CDP-STAR type. The filter 
is visualized after a 15 min exposure at 60°C. 

SSCP {single strand conformation polymorphism) 

15 analysis makes it possible to detect discrete 
modifications of the sequence of the fragments 
amplified by PCR. The PCR is carried out in the 
presence of dCTP labeled with 32 P. The sample to be 
analyzed is denatured at 95°C for 10 min in the 

20 presence of loading buffer, and then immediately loaded 
onto a 10% polyacrylamide gel containing 7.5% glycerol. 
The migration is carried out at 4°C at 8-10 W. The gel 
is dried and then autoradiographed . 

The PCR fragments likely to exhibit an 

25 alteration of their nucleotide sequence are sequenced 
according to Example 1. 

Hybridization with the aid of a specific 
oligonucleotide (17 mers to 20 mers) corresponding to 
the modified nucleotide region makes it possible to 

30 identify the samples having an identical modification 
(ASO method) . Briefly, the southern is hybridized with 
an oligonucleotide which is distally labeled either 
with 32 P, or in the presence of digoxygenin (according 
to the Boehringer Mannheim protocol) and then washed 

35 under stringent conditions at 65°C in a 6XSSC solution 
containing 0.05% sodium pyrophosphate. 

EXAMPLE 3 : Detection of a protein according to 
the Invention in a biological sample 
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- Preparation of a purified protein fraction of 
cerebrospinal fluid from patients suffering from MS 

After a treatment at 56°C for 30 min and 
removal of the immunoglobulins on a G HiTrap protein 
5 column (Pharmacia) , the equivalent of 10 ml of CSF is 
deposited on a DEAE Sepharose CL-6B column (Pharmacia) * 
The elution is carried out in 20 mM Tris-HCl, pH 8.8, 
and a gradient from 0 to 0.4 M NaCl, and then the 
fraction is dialyzed twice against a phosphate-NaCl 

10 buffer (PBS) . After concentration on Ultraf ree-MC 
(Millipore) , the fraction is deposited on a Superose 12 
column (FPLC Pharmacia) and eluted in the presence of 
PBS, After separation by polyacrylamide-SDS gel 
electrophoresis and electrotransf er onto an Immobilon-P 

15 membrane (Millipore), the protein bands are subjected 
to controlled trypsin hydrolysis. 

- Analysis of the protein fraction by mass 
spectrometry 

The peptides digested in the presence of 
20 trypsin are analyzed by the MALDI-TOF method, which 
allows the analysis of peptides present in a mixture 
(COTTRELL J.S., Pept . Res., 1997, 7, 115-124). The 
peptides characterized according to their mass are 
compared with the proteins and with the associated 
25 proteins according to the invention. 

EXAMPLE 4 : Detection of specific antibodies to the env 
domain of HERV-7q 

The identification of a long open reading frame 
in the env sequence of HERV-7q made it possible to 
30 determine a deduced protein sequence SEQ ID NO: 23 / SEQ 
ID NO: 25, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29 
of a region of the said gene referenced by SEQ ID NO: 
22. 

The protein sequences deduced from the 
35 sequences ID NO: 23, 25, 27, 28, 29 are positioned as 
follows with respect to Figure 1 or the sequence 
ID NO: 3: 
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SEQ ID NO: 23 beginning of the coding sequence: 
position 7874, end of the coding sequence 1st nonsense 
codon (position 9493) 

SEQ ID NO: beginning of the coding sequence: 
5 position 7874, end of the coding sequence 1st nonsense 
codon (position 9493) (reading frame 1) 

SEQ ID NO: 27 beginning of the coding sequence: 
position 6970, end of the coding sequence 1st nonsense 
codon (position 9493) (reading frame 1) 

10 SEQ ID NO: 28 beginning of the coding sequence: 

position 6971, the end of the reading frame is shifted 
depending on the case by 1, 2 or 3 codons 

SEQ ID NO: 29 beginning of the coding sequence: 
position 6972, the end of the reading frame is shifted 

15 depending on the case by 1, 2 or 3 codons 

Various peptides corresponding to all or part 
of SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, SEQ ID 
NO: 28, SEQ ID NO: 29 were synthesized by genetic 
engineering in order to test their antigenic 

20 specificity toward sera or tissues from patients 
suffering from MS, for example. Briefly, all or part of 
the env region of HERV-7q is subcloned into the vectors 
pQE30, 31 and 32. The vectors pQE30, 31 and 32 contain, 
in 5' of the multiple cloning site, the consensus 

25 sequences for transcription (the strong T5 
bacteriophage promoter, 2 operators of the lactose 
operon) and translation (one synthetic ribosome binding 
site). Likewise, pQE30, 31 and 32 possess, in 3', the 
phage 1 transcription terminator as well as a Stop 

30 codon for translation. The expression of the protein is 
carried out after transformation in E. coli M15. The 
plasmid pQE30, 31 and 32 possess, upstream of the 
multiple cloning site, the coding sequence for a 
succession of 6 histidines having affinity for nickel 

35 ions. This stretch allows the purification of the 
expressed chimeric protein by adsorption on a resin 
consisting of a chelating ligand, nitrotriacetic acid 
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(NTA) , charged with 4 nickel ions (NI-NTA resin, 
Qiagen) . 

The transformation is carried out by electro- 
poration or treatment with calcium chloride. For 
5 example, an E. coli M15 colony is incubated in 100 ml 
of LB medium containing . 250 pg of kanamycin, with 
stirring at 37°C until an OD 600 of 0.5 is obtained. 
After centrif ugation for 5 minutes at 2000 g at 4°C, 
the bacterial pellet is taken up in 30 ml of TFB1 

10 solution (100 mM rubidium chloride, 50 mM manganese 
chloride, 30 mM potassium acetate, 10 mM CaCl2/ 15% 
glycerol, pH 5 . 8 ) , at 4°C for 90 minutes. After a 
centrif ugation of 5 minutes at 2000 g at 4°C, the 
bacterial pellet is taken up in 4 ml of TFB2 solution 

15 (10 mM rubidium chloride, 10 mM MOPS, 75 mM CaCl 2 , 15% 
glycerol, pH 8). The cells may be kept at -70°C in 
aliquots of 500 ml. 20 pi of the ligation and 125 pi of 
competent cells are mixed and placed on ice for 
20 minutes. After a heat shock of 42°C for 90 seconds, 

20 the cells are stirred for 90 minutes at 37°C in 500 ml 
of Psi-broth medium (LB medium supplemented with 4 mM 
MgS0 4 , 10 mM potassium chloride) . The transformed cells 
are plated on LB-agar dishes supplemented with 25 pg/ml 
of kanamycin and 100 pg/ml of ampicillin, and the 

25 dishes are incubated overnight at 37 °C. 

The potentially recombinant clones are sub- 
cultured in an orderly manner on a nylon filter 
deposited on an LB-agar dish supplemented with 25 pg/ml 
of kanamycin and 100 pg/ml of ampicillin. After one 

30 night at 37 °C, the recombinant clones are located by 
hybridization of the plasmid DNA with the nucleotide 
probe amplified by PCR with the pair of primers 
according to SEQ ID NO: 38 and SEQ ID NO: 39. 

An independent colony containing the insert is 

35 inoculated at 20 ml of LB medium supplemented with 
25 pg/ml of kanamycin and 100 pg/ml of ampicillin. 
After one night at 37 °C, with stirring, 500 ml of the 
same medium are incubated at 1/50 with this preculture 
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until an OD 600 of 0.8 is obtained, and then 1 to 2 mM 
final of IPTG is added. After 5 hours, the cells are 
centrifuged for 20 minutes at 4 000 g. 

A portion of the cellular pellet is taken up 
5 in 5 ml of sonification buffer (50 mM of sodium 
phosphate, pH 7.8, 300 mM NaCl) and then placed on ice. 
After rapid sonification, the cells are centrifuged for 
20 minutes at 10 000 g. A portion of the cellular 
pellet is taken up in 10 ml of a 30 mM Tris/HCl-20% 

10 sucrose solution pH 8. The cells are incubated for 5 to 
10 minutes, with stirring, after addition of 1 mM EDTA. 
After a centrif ugation of 10 minutes at 8 000 g at 4°C, 
the pellet is taken up in 10 ml of 5 mM ice cold MgSC>4 . 
After 10 minutes on the ice, with stirring, the cells 

15 are centrifuged for 10 minutes at 8 000 g at 4°C. 

The pellet is taken up in 5 ml/g in buffer A 
(6 M GuHCl (guanidine hydrochloride), 0.1 M sodium 
phosphate, 0.01 M Tris/HCl, pH 8), 1 hour at room 
temperature. The lysate is centrifuged for 15 minutes 

20 at 10 000 g at 4°C, and the supernatant is supplemented 
with 8 ml of Ni-NTA resin, pre-equilibrated in 
buffer A. After 45 minutes at room temperature, the 
resin is poured into a column, washed with 10 times the 
column volume with buffer A and then with 5 times the 

25 column volume with buffer B (8 M urea, 0.1 M sodium 
phosphate, 0.01 M Tris/HCl, pH 8). The column is washed 
with buffer C (8 M urea, 0.1 M sodium phosphate, 0.01 M 
Tris/HCl, pH 6.3) until A280 is less than 0.01. The 
recombinant protein is eluted with 10 to 20 ml of 

30 buffer D (8 M urea, 0.1 M sodium phosphate, 0.01 M 
Tris/HCl, pH 5.9) and then with 10 to 20 ml of buffer E 
(8 M urea, 0.1 M sodium phosphate, 0.01 M Tris/HCl, 
pH 4.5), and then with 20 ml of buffer F (6 M HC1, 
0.2 M acetic acid). After SDS-PAGE analysis, the 

35 purified fraction (s) containing the chimeric protein 
allowed the production of antibodies in rabbits. The 
antibodies obtained are tested by Western blotting 
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after visualization with a secondary antibody coupled 
to alkaline phosphatase. 

Antibodies are obtained in the same manner, 
using peptides synthesized chemically according to the 
5 Merrifield technique (G. Barany and B. Merrifield, 
1980, in The peptides, 2, 1-284, E. Gross and 
J. Meienhof er , Academic Press, New York) . 

The specific antibodies obtained are used for 
detection of the serum or tissue expression of all or 

10 part of the endogenous retroviral sequences according 
to the invention, in normal and pathological cases. 

The proteins of serum or tissue origin are 
separated on acrylamide-SDS gel and then transferred 
onto a nitrocellulose filter with the aid of a Novablot 

15 2117-2250 apparatus (LKB) . The transfer is carried out 
on a Hybond C-extra sheet (Amersham) using a 100 mM 
CAPS buffer pH 11, methanol, water (V/V/V: 1/1/8) 
containing 1 mM CaCl 2 . After a transfer of 1 hour at 
0.8 mA/cm 2 , the sheet is saturated for 1 hour at room 

20 temperature in PBS-0.5% gelatin. The sheet is brought 
into contact with the specific antibody at the 
concentration of 1/1 000 in PBS-0.25% gelatin. After 
2 hours, the filter is washed 3 times 15 minutes in 
PBS-0.1% Tween-20, and then the filter is incubated for 

25 30 minutes in the presence of a secondary antibody 
coupled to alkaline phosphatase (Promega), diluted 
1/7 500 in PBS-0.25% gelatin. After three washes in 
PBS-0.1% Tween-20, the filter is equilibrated in a 
buffer (100 mM Tris-HCl, pH 9.5, 100 mM NaCl, 5 mM 

30 MgCl 2 ) • The visualization is carried out in the 
presence of 45 \il of NBT at 75 mg/ml and 35 of BCIP 
at 50 mg/ml, per 10 ml of alkaline phosphatase buffer. 

The chimeric proteins obtained by genetic 
engineering are also used for tests of biological 

35 activity, such as for example the test for biological 
activity of the CKS-17-type peptide identified in the 
env domain of HERV-7q (Figure 5) . 
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EXAMPLE 5 : Production of ribonucleic probes encoding 
the env sequences of HERV-7q 

The PCR fragments obtained are subcloned into 
the plasmid PGEM 4Z (Promega) which possesses on either 
5 side of its multiple cloning site, promoter sequences 
for the SP6 and T7 RNA polymerases. 

The method of competence used is electro- 
poration. The plasmid and the PCR fragment are 
hybridized in a ratio of 50 ng of vector (Smal 

10 cleavage) to 100 ng of PCR fragment (made blunt ended 
by treatment with the Klenow fragment of DNA 
polymerase) . The incubation takes place overnight at 
22°C in ligation buffer (66 mM Tris-HCl, pH 7.5, 5 mM 
MgCl 2 , 1 mM dithioerythritol , 1 mM ATP) in the presence 

15 of 1 u of T4 DNA ligase and is then stopped by 
denaturation for 10 minutes at 65°C. In parallel, the 
E. coli JM 105 strain is inoculated overnight at 37 °C 
in LB medium. This preculture is diluted 1/500 and 
placed at 37 °C until an OD 600 equal to 1 is obtained. 

20 For the remainder of the procedure, the cells will 
always be stored at cold temperature. After 
centrif ugation for 5 minutes at 3 500 g at 4°C, the 
cellular pellet is resuspended in 1/4 vol. of ultra- 
pure ice-cold water. This step is repeated 5 to 

25 6 times. The pellet is then resuspended in 1/4 000 vol. 
of water; 10% of sterile glycerol is added, allowing 
preservation of the electrocompetent cells, in aliquots 
of 10 pi at 20°C. 1 pi of the ligation is added to 
50 pi of electrocompetent cells; the mixture is 

30 subjected to an electrical discharge of 12.5 kV/cm, 
applied for 5.8 ms . The cells are rapidly resuspended 
in the SOC medium, incubated for 1 hour at 37 °C and 
then plated in the presence of 2% X-Gal in 
dimethylf ormamide, and 10 mM IPTG, on an LB-agar dish 

35 supplemented with ampicillin (100 pg/ml) . After one 
night at 37 °C, the potentially recombinant white clones 
are subcultured in an orderly manner on an 
LB/ampicillin dish and in parallel on a nylon filter 
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deposited on an LB/ampicillin dish. These two dishes 
are incubated overnight at 37 °C. The recombinant clones 
are then located by hybridization with a nucleic probe 
amplified by PCR with the pair or primers according to 
5 SEQ ID NO: 38 and SEQ ID NO: 39 and labeled with 
digoxygenin . 

The recombinant clones are cultured in 50 ml of 
LB/ampicillin medium (100 jig/ml) , with stirring, over- 
night at 37 °C. After centrif ugation at 3 500 g for 

10 15 minutes at 4°C / the bacterial pellet is taken up in 
4 ml of PI buffer (50 mM Tris-HCl, 10 mM EDTA, 
400 yig /ml RNase A, pH 8) and 4 ml of P2 buffer (200 mM 
NaOH, 1% SDS) . The medium is incubated at room 
temperature for 5 minutes. After addition of 4 ml of 

15 P3 buffer (2.55 M potassium acetate, pH 4.8), the 
mixture is centrifuged at 12 000 g for 30 minutes at 
4°C. This supernatant is applied to a Qiagen type 100 
column, pre-equilibrated with 2 ml of QBT buffer 
(750 mM NaCl, 50 mM MOPS, 15% ethanol, pH 7), the 

20 column is washed with twice 4 ml of QC buffer (1 M 
NaCl, 50 mM MOPS, 15% ethanol, pH 7) and the DNA is 
eluted with 2 ml of QF buffer (1.2 M NaCl, 50 mM MPOS, 
15% ethanol, pH 8 ) . The DNA is precipitated with 
0.8 vol. of isopropanol and centrifuged at 12 000 g at 

25 4°C for 30 minutes. The pellet is washed with 70% ice- 
cold ethanol and then the plasmid DNA is taken up in 
twice 150 \il of TE buffer. 

The ribonucleic probes are used as specific 
probes, in particular for the detection of the 

30 transcripts expressed by the endogenous retroviral 
sequences according to the invention. 
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present invention . 



1 , . 

- 40 - 



SEQUENCE LISTING 



(1) GENERAL INFORMATIONS : 
(i) APPLICANT: 

(A) NAME: INSTITUT NATIONAL DE LA RECHERCHE MEDICALE - 

INSERM 

(B) ROAD: 101 RUE DE TOLBIAC 

(C) TOWN: PARIS 

(E) COUNTRY: FRANCE 

(F) POSTAL CODE: 75654 CEDEX 

(ii) INVENTION TITLE: NUCLEIC SEQUENCE AND DEDUCED PROTEIN SEQUENCE 
FAMILY WITH HUMAN ENDOGENOUS RETROVIRAL . MOT I FS , AND THEIR USES. 

(iii) NUMBER OF SEQUENCES: 51 

(iv) COMPUTER READIBLE FORM 

(A) SUPPORT TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (OEB) 



(2) INFORMATIONS FOR SEQ ID NO: 1: env 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2599 base ^pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATCCCCTGCC TTAATCGCCA AGCTCCTTCA GGAGAACAAA GAACAGGCCA TTACCCTGGA 60 

GAAGACTGGC AACTGATTTT ACCCACAAGC CCAAACCTCA GGGATTTCAG TATCTACTAG 120 

TCTGGGTAGA TACTTTCACG GGTTGGGCAG AGGCCTTCCC CTGTAGGACA GAAAAGGCCC 180 

AAGAGGTAAT AAAGGCACTA GTTCATGAAA TAATTCCCAG ATTCGGACTT CCCCGAGGCT 24 0 

TACAGAGTGA CAATAGCCCT GCTTTCCAGG CCACAGTAAC CCAGGGAGTA TCCCAGGCGT 300 

TAGGTATACG ATATCACTTA CACTGCGCCT GAAGGCCACA GTCCTCAGGG AAGGTCGAGA 3 60 

AAATGAATGA AACACTCAAA GGACATCTAA AAAAGCAAAC CCAGGAAACC CACCTCACAT 4 20 

GGCCTGCTCT GTTGCCTATA GCCTTAAAAA GAATCTGCAA CTTTCCCCAA AAAGCAGGAC 480 

TTAGCCCATA CGAAATGCTG TATGGAAGGC CCTTCATAAC CAATGACCTT GTGCTTGACC 54 0 

CAAGACAGCC AACTTAGTTG CAGACATCAC CTCCTTAGCC AAATATCAAC AAGTTCTTAA 600 



AACATTACAA GGAACCTATC CCTGAGAAGA GGGAAAAGAA CTATTCCACC CTTGTGACAT 



660 



GGTATTAGTC AAGTCCCTTC CCTCTAATTC CCCATCCCTA GATACATCCT GGGAAGGACC 720 

CTACCCAGTC ATTTTATCTA CCCCAACTGC GGTTAAAGTG GCTGGAGTGG AGTCTTGGAT 780 

ACATCACACT TGAGTCAAAT CCTGGATACT GCCAAAGGAA CCTGAAAATC C AG G AG AC AA 840 

CGCTAGCTAT TCCTGTGAAC CTCTAGAGGA TTTGCGCCTG CTCTTCAAAC AACAACCAGG 900 

AGGAAAGTAA CTAAAATCAT AAATCCCCAT GGCCCTCCCT TATCATATTT TTCTCTTTAC 960 

TGTTCTTTTA CCCTCTTTCA CTCTCACTGC ACCCCCTCCA TGCCGCTGTA TGACCAGTAG 1020 

CTCCCCTTAC CAAGAGTTTC TATGGAGAAT GCAGCGTCCC GGAAATATTG ATGCCCCATC 1080 

GTATAGGAGT CTTTCTAAGG GAACCCCCAC CTTCACTGCC CACACCCATA TGCCCCGCAA 114 0 

CTGCTATCAC TCTGCCACTC TTTGCATGCA TGCAAATACT CATTATTGGA CAGGAAAAAT 12 00 

GATTAATCCT AGTTGTCCTG GAGGACTTGG AGTCACTGTC TGTTGGACTT ACTTCACCCA 12 60 

AACTGGTATG TCTGATGGGG GTGGAGTTCA AGATCAGGCA AGAGAAAAAC AT G T AAAAG A 1320 

AGTAATCTCC CAACTCACCC GGGTACATGG CACCTCTAGC CCCTACAAAG GACTAGATCT 138 0 

CTCAAAACTA CAT G AAAC C C TCCGTACCCA TACTCGCCTG GTAAGCCTAT TTAATACCAC 14 4 0 

CCTCACTGGG CTCCATGAGG TCTCGGCCCA AAACCCTACT AACTGTTGGA TATGCCTCCC 1500 

CCTGAACTTC AGGCCATATG TTTCAATCCC TGTACCTGAA CAATGGAACA ACTTCAGCAC 15 60 

AGAAATAAAC ACCACTTCCG TTTTAGTAGG ACCTCTTGTT TCCAATCTGG AAATAACCCA 1620 

TACCTCAAAC CTCACCTGTG T AAAAT T TAG CAATACTACA TACACAACCA ACTCCCAATG 1680 

CATCAGGTGG GTAACTCCTC CCACACAAAT AGTCTGCCTA CCCTCAGGAA TATTTTTTGT 17 4 0 

CTGTGGTACC TCAGCCTATC GTTGTTTGAA TGGCTCTTCA GAATCTATGT GCTTCCTCTC 18 00 

ATTCTTAGTG CCCCCTATGA CCATCTACAC TGAACAAGAT TTATACAGTT ATGTCATATC 18 60 

TAAGCCCCGC AACAAAAGAG TACCCATTCT TCCTTTTGTT ATAGGAGCAG GAGTGCTAGG 1920 

TGCACTAGGT ACTGGCATTG GCGGTATCAC AACCTCTACT CAGTTCTACT AC AAAC TAT C 198 0 

TCAAGAACTA AATGGGGACA TGGAACGGGT CGCCGACTCC CTGGTCACCT TGCAAGATCA 2 04 0 

ACTTAACTCC CTAGCAGCAG TAGTCCTTCA AAATCGAAGA GCTTTAGACT TGCTAACCGC 2100 

TGAAAGAGGG GGAACCTGTT TATTTTTAGG GGAAGAATGC TGTTATTATG TTAATCAATC 2160 

CGGAATCGTC AC T G AG AAAG TTAAAGAAAT TCGAGATCGA ATACAACGTA GAGCAGAGGA 222 0 

GCTTCGAAAC ACTGGACCCT GGGGCCTCCT CAGCCAATGG ATGCCCTGGA TTCTCCCCTT ' 22 8 0 

CTTAGGACCT CTAGCAGCTA TAATATTGCT ACTCCTCTTT GGACCCTGTA TCTTTAACCT 234 0 

CCTTGTTAAC TTTGTCTCTT CCAGAATCGA AGCTGTAAAA CTACAAATGG AGCCCAAGAT 2 4 00 

GCAGTCCAAG ACTAAGATCT ACCGCAGACC CCTGGACCGG CCTGCTAGCC CACGATCTGA 2 4 60 

TGTTAATGAC ATCAAAGGCA CCCCTCCTGA GGAAATCTCA GCTGCACAAC CTCTACTACG 2 52 0 
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CCCCAATTCA GCAGGAAGCA GTTAGAGCGG TCTCGGCCAA CCTCCCCAAC AGCACTTAGG 2580 

TTTTCCTGTT GAGATGGGG 25 99 

(2) INFORMATIONS FOR SEQ ID NO: 2: gag 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1326 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GCCGCCTGGC ACTCCTGAGG G AAG TAT AAA TTATAACACC ATCTTACAGC TAGACCTCTT 60 

TTGTAGAAAA GGCAAATGGA GTGAAGTGCC ATAAGTACAA ACTTTCTTTT CATTAAGAGA 120 

CAACTCACAA TTATGTAAAA AGTGTGATTT ATGCCCTACA GGAAGCCTTC AGAGTCTACC 180 

TCCCTATCCC AGCATCCCCG ACTCCTTCCC CAACTAATAA GGACCCCCCT TCAACCCAAA 24 0 

TGGTCCAAAA GGAGATAGAC AAAAGGGTAA ACAGTGAACC AAAGAGTGCC AATATTCCCC 300 

AATTATGACC CCTCCAAGCA GTGGGAGGAA GAGAATTCGG CCCAGCCAGA GTGCATGTGC 3 60 

CTTTTTCTCT CCCAGACTTA AAGCAAATAA AAAC AG AC T T AGGTAAATTC TCAGATAACC 4 20 

CTGATGGCTA TATTGATGTT TTACAAGGGT TAGGACAATT CTTTGATCTG AC AT GGAG AG 4 80 

ATATAATGTC ACTGCTAAAT C AG AC AC T AA CCCCAAATGA GAGAAGTGCC ACCATAACTG 54 0 

CAGCCTGAGA GTTTGGCGAT CTCTGGTATC TCAGTCAGGT CAATGATAGG AT G AC AAC AG 600 

AG G AAAG AG A ATGATTCCCC ACAGGCCAGC AGGCAGTTCC CAGTCTAGAC CCTCATTGGG 660 

AC AC AG AAT C AG AAC AT G G A GATTGGTGCT GCAGACATTT GCTAACTTGT GTGCTAGAAG 720 

GACTAAGGAA AACTAGGAAG AAGTCTATGA ATTACTCAAT GATGTCCACC ATAACACAGG 780 

G AAG G G AAG A AAATCCTACT GCCTTTCTGG AG AG AC T AAG GGAGGCATTG AGGAAGCGTG 84 0 

CCTCTCTGTC ACCTGACTCT TCTGAAGGCC AACTAATCTT AAAGCGTAAG TTTATCACTC 900 

AGTCAGCTGC AGACATTAGA AAAAAACTTC AAAAGTCTGC CGTAGGCCCG GAGCAAAACT 960 

TAGAAACCCT ATTGAACTTG GCAACCTCGG TTTTTTATAA TAGAGATCAG GAGGAGCAGG 1020 

CGGAACAGGA CAAACGGGAT TAAAAAAAAG GCCACCGCTT TAGTCATGAC CCTCAGGCAA 108 0 

GTGGACTTTG GAGGCTCTGG AAAAGGGAAA AGCTGGGCAA ATTGAATGCC TAATAGGGCT 114 0 

TGCTTCCAGT GCGGTCTACA AGGACACTTT AAAAAAGATT GTCCAAGTAG AAGTAAGCCG 1200 

CCCCCTCGTC CATGCCCCTT ATTTCAAGGG AATCACTGGA AGGCCCACTG CCCCAGGGGA 1260 



4, 
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CAAAGGTCCT CTGAGTCAGA AGCCACTAAC CAGATGATCC AGCAGCAGGA CTGAGGGTGC 1320 
CTGGGG 1326 



(2) INFORMATIONS FOR SEQ ID NO: 3: HERV-7q 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10499 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 3: 

CCCTGGGGCG GGCTTCCTTT CTGGGATGAG GGCAAAACGC CTGGAGATAC AGCAATTATC 60 

TTGCAACTGA GAG AC AG G AC TAGCTGGATT TCCTAGGCCG ACTAAGAATC CCTAAGCCTA 120 

GCTGGGAAGG TGACCACGTC CACCTTTAAA CACGGGGCTT GCAACTTAGC TCACACCTGA 180 

CCAATCAGAG AGCTCACTAA AATGCTAATT AG G C AAAG AC AGGAGGTAAA GAAATAGCCA 24 0 

AT CATC TAT T GCCTGAGAGC ACAGCAGGAG GGACAACAAT CGGGATATAA ACCCAGGCAT 300 

TCGAGCTGGC AACAGCAGCC CCCCTTTGGG TCCCTTCCCT TTGTATGGGA GCTGTTTTCA 3 60 

TGCTATTTCA CTCTATTAAA TCTTGCAACT GCACTCTTCT GGTCCATGTT TCTTACGGCT 420 

CGAGCTGAGC TTTTGCTCAC CGTCCACCAC TGCTGTTTGC CACCACCGCA GACCTGCCGC 4 80 

TGACTCCCAT CCCTCTGGAT CCTGCAGGGT GTCCGCTGTG CTCCTGATCC AGCGAGGCGC 54 0 

CCATTGCCGC TCCCAATTGG GCTAAAGGCT TGCCATTGTT CCTGCACGGC TAAGTGCCTG 600 

GGTTTGTTCT AATTGAGCTG AACACTAGTC ACTGGGTTCC ATGGTTCTCT TCTGTGACCC 660 

ACGGCTTCTA ATAGAACTAT AACACTTACC ACATGGCCCA AGATTCCATT CCTTGGAATC 720 

CGTGAGGCCA AGAACTCCAG GTCAGAGAAT ACGAGGCTTG CCACCATCTT GGAAGCGGCC 780 

TGCTACCATC TTGGAAGTGG TTCACCACCA TCTTGGGAGC TCTGTGAGCA AGGACCCCCC 840 

GGTAACATTT TGGCAACCAC G AAC G G AC AT CCAAAGTGGT GAGTAATATT GGACCACTTT 900 

CACTTGCTAT TCTGTCCTAT CCTTCCTTAG AATTGGAGGA AAATACCGGG CACTTGTCGG 9 60 

CCAGTTAAAA AC GAT TAG TG TGGCCACCGG ACTTAAGACT CAGGTGTGAG GCTATCTGGG 1020 

GAAGGGCTTT CTAACAACCC CCAACCCTTC TGGGTTGGGG ACTTGGTTTG CCTCAAGCCA 108 0 

GCTTCCACTT TCAGTTTTCT TGGGGAAGCC GAGGGCCGAC TAGAGGCAGA AAGCTGTCGT 114 0 

CCTGAACTCC CGGCAGTAGC CGGTTGAGAT CATGGTGTAG CCAGAAGTCT CAACAGTCGC 1200 

CCATGCATGC ACCCCTATCT TTCCTTCTGA CCCATACCTC CTGGGTCCCA AC C AC AAC T T 12 60 
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TCTTCAAAGT GTAGCCCCAA AATTCTCCTT ACCTCTGAAT ATACTTCCTC TGATCCCTGC 1320 

CTCCTAGGTA CTATTGGTTC AGACTTCCAT TTCCTCTAGC AAGTTGTATC TCCAAAGGGA 138 0 

TCTAAGGAAG CTCTGCGCTG CGTCCTTAGG CACCTAGGCT ATAACCCAGG GAGTCTTATC 14 4 0 

CCTGGTGTCC CTCCCAATTT AGGCATACAG CTCTTGACAT GGGCAGTTAT GTAGGACCCA 1500 

CTCCCCACCA CCCTTGCCAG GGCCCCAAGT TTGTAAATGG CTGAGGGAAA AGAGAGACAG 15 60 

AGGAGAGAGA GAGAAATGGA G G AG AAAG AG AGAGAGACAG AGAGGAGAGA GAGACAGTGA 1620 

GAGAGACAGA AGAGAGAGAG AGACAAAGAG GAGAGAGAGA GAGTCAAAGA GAGAAAGAAA 168 0 

GAGAAAGAAA TAGTAAAAAA CAGTGTGCCC TATTCCTTTA AAAGCCAGGG TAAATTTAAA 17 4 0 

ACCTGTACTT GATAATTGAA GGTCTTCTCT GTGACCCTAT AGCACTCCAA TCCACTTTGT 18 00 

GGTCAGTGTA AATAAGAGCA TAGGCCGAAA GCACTGAGGC CATTGACAAC CCGTAGCTTC 18 60 

CCTATCAAAA ATCCTTAACC CAGTAACCCG CAGATGGACC AAATGCATTC AGTCGGTAGC 1920 

GCAACTGCTT TGCTAAAAGT AGAAAAGTAA CTTTTAGAGG AAACCTCATT GTGAGCACAC 198 0 

CTCACCTGTT CAGAATTATT CTAATAAAAA AAGCAAAAAG GTAGCTTACT AACTCAAAAA 204 0 

TCTTAAAGTA TGGGGCTATT CTGTTAGAAA AAGGTAATGT AACTCCAACC ACTGATAATT 2100 

CCCTTAACCC AGCAGATTTC CTAACGGGAT TTAAATCTTA ATTACCATAC AAAGGTCCGA 2160 

CCAGACCTAG GCGGAACTCC CTTCAGGACA GGACGATAGA TGGTTCCTCC CAGGTGATTG 222 0 

AGGAAAAAAA CCACAATGGG TATTCAGTAA TTGATACGGG GACTCTTGTG GAAGCAGAGT 22 8 0 

TAGAAAAATT GCCTAATAAC TGGTCTCCTC AAACGTGTGA GCTGTTTGCA CTCAGCCAAG 2 34 0 

CCTTAAAGTA CTTACAGAAT CAAAAGACTA TCTCAATCCT GAT T C AAAAG GTTAGCTACA 2 4 00 

CCCTCTCTGT AATGCATTTG CATAAGAACT TGTTTATGGG AATGCATCTT GATGGGGCAG 2 4 60 

CTGGGTTGTT ATAAAATAGG AACCCAGCCC AGCTCTAGGA CTCACCCCTG AGCGCAAAGG 2520 

CAATGTTGGG CATGCTGGTA AAGGACCACT AGAATCCAGC AGCCCAGACC CCTTTCTTTG 2 580 

TGGTCAAGAA AGGCGGGAAA AGGGGTGCAG GACTGCTACA TCGGTAAGCA TAACTAATCC 2 640 

GATAAACAGA GGTCCATGGG TGGTTACGCA CCCTGGAAAG GAACTCACCC CTGAGCACAA 27 00 

AGGCAATGTT GGGCACGCTG GTAAAGGACC ACTAGAATCC AGCAGCCTGG ACCCCTTTCT 27 60 

TTGTGGTCAA GAGAGGCAGG AAAACAGGTG CAGGACTGCA ACATCAGTGA GCATAACTAA 2 820 

TTCGATAAGC AGAGGTCCAT GGGTGGTGAT GCACCCTGGA AAGAATAAGC ATTAGGACCA 2 8 80 

TAGAGGACAC TCCAGGACTA AAGCTCATCG GAAAATGACT AGGGTTGCTG GCATCCCTAT 2 94 0 

GTTCTTTTTT C AG AT G G G AA ACGTTCCCCG CAAGACAAAA ACGCCCCTAA GACGTATTCT 3000 

GGAGAATTGG GACCAATTTG ACCCTCAGAC ACTAAGAAAG AAACGACTTA TATTCTTCTG 30 60 
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CAGTGCCGCC TGGCACTCCT GAGGGAAGTA TAAATTATAA CACCATCTTA CAGCTAGACC 3120 

TCTTTTGTAG AAAAGGCAAA TGGAGTGAAG TGCCATAAGT ACAAACTTTC TTTTCATTAA 3180 

GAGACAACTC ACAATTATGT AAAAAGTGTG ATTTATGCCC TACAGGAAGC CTTCAGAGTC 32 4 0 

TACCTCCCTA TCCCAGCATC CCCGACTCCT TCCCCAACTA ATAAGGACCC CCCTTCAACC 3300 

CAAATGGTCC AAAAGGAGAT AGACAAAAGG GTAAACAGTG AACCAAAGAG TGCCAATATT 3360 

CCCCAATTAT GACCCCTCCA AGCAGTGGGA GGAAGAGAAT TCGGCCCAGC C AG AG T G CAT 34 2 0 

GTGCCTTTTT CTCTCCCAGA CTTAAAGCAA ATAAAAACAG ACTTAGGTAA ATTCTCAGAT 34 8 0 

AACCCTGATG GCTATATTGA TGTTTTACAA GGGTTAGGAC AATTCTTTGA TCTGACATGG 354 0 

AGAGATATAA TGTCACTGCT AAAT CAG AC A CTAACCCCAA AT G AG AGAAG TGCCACCATA 3 600 

ACTGCAGCCT GAGAGTTTGG CGATCTCTGG TATCTCAGTC AGGTCAATGA TAGGATGACA 3 660 

AC AG AG G AAA GAGAATGATT CCCCACAGGC CAGCAGGCAG TTCCCAGTCT AGACCCTCAT 3720 

TGGGACACAG AATCAGAACA TGGAGATTGG TGCTGCAGAC ATTTGCTAAC TTGTGTGCTA 37 8 0 

GAAGGACTAA GGAAAACTAG GAAGAAGTCT ATGAATTACT CAATGATGTC CACCATAACA 384 0 

CAGGGAAGGG AAGAAAATCC TACTGCCTTT CTGGAGAGAC TAAGGGAGGC AT T G AGGAAG 3 900 

CGTGCCTCTC TGTCACCTGA CTCTTCTGAA GGCCAACTAA TCTTAAAGCG TAAGTTTATC 3 960 

AC T CAG T CAG CTGCAGACAT TAGAAAAAAA CTTCAAAAGT CTGCCGTAGG CCCGGAGCAA 4 02 0 

AAC T TAG AAA CCCTATTGAA CTTGGCAACC TCGGTTTTTT ATAATAGAGA T CAG GAG GAG 4 080 

CAGGCGGAAC AGGACAAACG GGATTAAAAA AAAGGCCACC GCTTTAGTCA TGACCCTCAG 414 0 

GCAAGTGGAC TTTGGAGGCT CTGGAAAAGG GAAAAGCTGG GC AAAT T GAA TGCCTAATAG 4 200 

GGCTTGCTTC CAGTGCGGTC TACAAGGACA CTTTAAAAAA GATTGTCCAA GTAGAAGTAA 4 2 60 

GCCGCCCCCT CGTCCATGCC CCTTATTTCA AGGGAATCAC TGGAAGGCCC ACTGCCCCAG 4 320 

GGGACAAAGG TCCTCTGAGT CAGAAGCCAC TAACCAGATG ATCCAGCAGC AGGACTGAGG 4 38 0 

GTGCCTGGGG CAAGCGCCAT CCCATGCCAT CACCCTCACA GAGCCCTGGG TATGCTTGAC 444 0 

CATTGAGGGC CAGGAGGTTG TCTCCTGGAC ACTGGTGCGG TCTTCTTAGT CTTACTCTTC 4 500 

TGTCCCGGAC AACTGTCCTC CAGATCTGTC ACTATCTGAG GGGGTCCTAA GACGGGCAGT 4 5 60 

CACTAGATAC TTCTCCCAGC CACTAAGTTA TGACTGGGGA GCTTTATTCT TTTCACATGC 4 620 

TTTTCTAATT ATGCTTGAAA GCCCCACTAC CTTGTTAGGG AGAGACATTC TAG C AAAAG C 4 68 0 

AGGGGCCATT ATACACCTGA ACATAGGAGA AGGAACACCC GTTTGTTGTC CCCTGCTTGA 474 0 

GGAAGGAATT AATCCTGAAG TCTGGGCAAC AGAAGGACAA TATGGACGAG CAAAGAATGC 4 8 00 

CCGTCCTGTT CAAGTTAAAC TAAAGGATTC CACCTCCTTT CCCTACCAAA GGCAGTACCC 4 8 60 

CCTCAGACCC AAGGCCCAAC AAGGACTCCA AAAGATTGTT AAGGACCTAA AAGCCCAAGG 4 92 0 
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CCTAGTAAAA CCATGCAGTA ACCCCTGCAG TACTCCAATT TTAGGAGTAC AGAAACCCAA 4 980 

CAGACAGTGG AGGTTAGTGC AAGATCTCAG GATTATCAAT GAGGCTGTTG TTCCTCTATA 504 0 

GCCAGCTGTA CCTAGCCCTT ATACTCTGCT TTCCCAAATA CCAGAGGAAG CAGAGTGGTT 5100 

TACAGTCCTG GACCTTCAGG ATGCCTTCTT CTGCATCCCT GTACATCCTG ACTCTCAATT 5160 

CTTGTTTGCC TTTGAAGATA CTTCAAACCC AACATCTCAA CTCACCTGGA CTATTTTACC 5220 

CCAAGGGTTC AGGGATAGTC CCCATCTATT TGGCCAGGCA TTAGCCCAAG ACTTGAGCCA 5280 

ATCCTCATAC CTGGACACTT GTCCTTCGGT AGGTGGATGA TTTACTTTTG GCCGCCCATT 534 0 

CAGAAACCTT GTGCCATCAA GCCACCCAAG CGCTCTTCAA TTTCCTCGCT ACCTGTGGCT 54 00 

ACATGGTTTC CAAACCAAAG GCTCAACTCT GCTCACAGCA GGT TACT TAG GGCTAAAATT 54 60 

ATCCAAAGGC ACCAGGGCCC TCAGTGAGGA AC AC AT C C AG CCTATACTGG CTTATCCTCA 5520 

TCCCAAAACC CTAAAGCAAC TAAGGGGATT CCTTGGCGTA ATAGGTTTCT GCCGAAAATG 5580 

GATTCCCAGG TATGGCGAAA TAGCCAGGTC ATTAAATACA CTAATTAAGG AAACTCAGAA 5 64 0 

AGCCAATACC CATTTAGTAA GAT GG AC AAC TGAAGTAGAA GTGGCTTTCC AGGCCCTAAC 5700 

CCAAGCCCCA GTGTTAAGTT TGCCAACAGG GCAAGACTTT TCTTCATATG T C AC AGAAAA 57 60 

AACAGGAATA GCTCTAGGAG TCCTTACACA GATCCGAGGG ATGAGCTTGC AACCTGTGGC 5820 

ATACCTGACT AAGGAAATTG ATGTAGTGGC AAAGGGTTGA CCTCATTGTT TACGGGTAGT 58 8 0 

GGTGGCAGTA GCAGTCTTAG TATCTGAAGC AGTTAAAATA AT AC AG GG AA GAGATCTTAC 594 0 

TGTGTGGACA TCTCATGATG TGAATGGCAT ACTCACTGCT AAAG GAG AC T TGTGGCTGTC 6000 

AGACAACTGT TTACTTAAAT GTCAGGCTCT AT T AC T T G AA GGGCCAGTGC TGCGACTGTG 60 60 

CACTTGTGCA ACTCTTAACC CAGCCACATT TCTTCCAGAC AAT G AAG AAA AGATAAAACA 6120 

TAACTGTCAA CAAGTAATTT CTCAAACCTA TGCCACTCGA GGGGACCTTT TAGAGGTTCC 6180 

TTTGACTGAT CCCGACCTCA ACTTGTATAC TGATGGAAGT TCCTTTGTAG AAAAAGGACT 624 0 

TCGAAAAGTG GGGTATGCAG TGGTCAGTGA TAATGGAATA CTTGAAAGTA ATCCCCTCAC 6300 

TCCAGGAACT AGTGCTCAGC TAGCAGAACT AATAGCCCTC ACTTGGGCAC TAGAATTAGG 6360 

AGAAGAAAAA AGGGCAAATA TATATACAGA CTCTAAATAT GCTTACCTAG TCCTCCATGC 64 2 0 

CCATGCAGCA ATATGGAAAG AAAGGGAATT CCTAACTTCT GAGAGAACAC CTATCAAACA 64 8 0 

TCAGGAAGCC AT TAG G AAAT TATTATTGGC TGTACAGAAA CCTAAAGAGG TGGCAGTCTT 654 0 

ACACTGCCGG GGTCATCAGA AAGGAAAGGA AAGGGAAATA G AAG AG AAC T GCCAAGCAGA 6 600 

TATTGAAGCC AAAAGAGCTG CAAGGCAGGA CCCTCCATTA GAAATGCTTA TAAAACAACC 6660 
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CCTAGTATAG GGTAATCCCC TCCGGGAAAC CAAGCCCCAG TACTCAGCAG GAGAAACAGA 6720 

ATGGGGAACC TCACGAGGAC AGTTTTCTCC CCTCGGGACG GCTAGCCACT GAAGAAGGGA 67 8 0 

AAATACTTTT GCCTGCAACT ATCCAATGGA AATTACTTAA AACCCTTCAT CAAACCTTTC 68 4 0 

ACTTAGGCAT CGATAGCACC CATC AG AT GG CCAAATCATT ATTTACTGGA CCAGGCCTTT 6900 

TCAAAACTAT CAAGCAGATA GTCAGGGCCT GTGAAGTGTG C C AG AG AAAT AATCCCCTGC 6960 

CTTATCGCCA AGCTCCTTCA GGAGAACAAA GAACAGGCCA TTACCCTGGA GAAGACTGGC 7020 

AACTGATTTT ACCCACAAGC CCAAACCTCA GGGATTTCAG TATCTACTAG TCTGGGTAGA 7 08 0 

TACTTTCACG GGTTGGGCAG AGGCCTTCCC CTGTAGGACA GAAAAGGCCC AAGAGGTAAT 714 0 

AAAGGCACTA GTTCATGAAA TAATTCCCAG ATTCGGACTT CCCCGAGGCT TACAGAGTGA 7200 

CAATAGCCCT GCTTTCCAGG CCACAGTAAC CCAGGGAGTA TCCCAGGCGT TAGGTATACG 72 60 

ATATCACTTA CACTGCGCCT GAAGGCCACA GTCCTCAGGG AAGGTCGAGA AAATGAATGA 7320 

AAC AC T C AAA GGACATCTAA AAAAGCAAAC CCAGGAAACC CACCTCACAT GGCCTGCTCT 7380 

GTTGCCTATA GCCTTAAAAA GAATCTGCAA CTTTCCCCAA AAAGCAGGAC TTAGCCCATA 7 4 40 

CGAAATGCTG TATGGAAGGC CCTTCATAAC CAATGACCTT GTGCTTGACC CAAGACAGCC 7 500 

AACTTAGTTG CAGACATCAC CTCCTTAGCC AAATATCAAC AAGTTCTTAA AAC AT T AC AA 75 60 

GGAACCTATC CCTGAGAAGA GGGAAAAGAA CTATTCCACC CTTGTGACAT GGTATTAGTC 7 620 

AAGTCCCTTC CCTCTAATTC CCCATCCCTA GATACATCCT GGGAAGGACC CTACCCAGTC 7 680 

ATTTTATCTA CCCCAACTGC GGTTAAAGTG GCTGGAGTGG AGTCTTGGAT ACATCACACT 77 4 0 

T GAG TC AAAT CCTGGATACT GCCAAAGGAA CCTGAAAATC CAGGAGACAA CGCTAGCTAT 7 8 00 

TCCTGTGAAC CTCTAGAGGA TTTGCGCCTG CTCTTCAAAC AACAACCAGG AGGAAAGTAA 7 8 60 

CTAAAATCAT AAATCCCCAT GGCCCTCCCT TAT CAT AT TT TTCTCTTTAC TGTTCTTTTA 7 920 

CCCTCTTTCA CTCTCACTGC ACCCCCTCCA TGCCGCTGTA TGACCAGTAG CTCCCCTTAC 7 98 0 

CAAGAGTTTC TATGGAGAAT GCAGCGTCCC GGAAATATTG ATGCCCCATC GTATAGGAGT 804 0 

CTTTCTAAGG GAACCCCCAC CTTCACTGCC CACACCCATA TGCCCCGCAA CTGCTATCAC 8100 

TCTGCCACTC TTTGCATGCA TGCAAATACT CATTATTGGA CAGGAAAAAT GATTAATCCT 8160 

AGTTGTCCTG GAGGACTTGG AGTCACTGTC TGTTGGACTT ACTTCACCCA AACTGGTATG 8 220 

TCTGATGGGG GTGGAGTTCA AG AT C AG G C A AGAGAAAAAC AT G T AAAAG A AGTAATCTCC 8280 

CAACTCACCC GGGTACATGG CACCTCTAGC CCCTACAAAG GACTAGATCT CTCAAAACTA 8 34 0 

CATGAAACCC TCCGTACCCA TACTCGCCTG GTAAGCCTAT TTAATACCAC CCTCACTGGG 8 4 00 

CTCCATGAGG TCTCGGCCCA AAACCCTACT AACTGTTGGA TATGCCTCCC CCTGAACTTC 84 60 

AGGCCATATG TTTCAATCCC TGTACCTGAA CAATGGAACA ACTTCAGCAC AGAAATAAAC 8 520 
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ACCACTTCCG TTTTAGTAGG ACCTCTTGTT TCCAATCTGG AAATAACCCA TACCTCAAAC 858 0 

CTCACCTGTG T AAAAT T TAG CAATACTACA TACACAACCA ACTCCCAATG CATCAGGTGG 8 64 0 

GTAACTCCTC CCACACAAAT AGTCTGCCTA CCCTCAGGAA TATTTTTTGT CTGTGGTACC 87 00 

TCAGCCTATC GTTGTTTGAA TGGCTCTTCA GAATCTATGT GCTTCCTCTC ATTCTTAGTG 87 60 

CCCCCTATGA CCATCTACAC TGAACAAGAT TTATACAGTT ATGTCATATC TAAGCCCCGC 8820 

AACAAAAGAG TACCCATTCT TCCTTTTGTT ATAGGAGCAG GAGTGCTAGG TGCACTAGGT 888 0 

ACTGGCATTG GCGGTATCAC AACCTCTACT CAGTTCTACT ACAAACTATC TCAAGAACTA 8 94 0 

AAT GGGG AC A TGGAACGGGT CGCCGACTCC CTGGTCACCT TGCAAGATCA ACTTAACTCC 9000 

CTAGCAGCAG TAGTCCTTCA AAATCGAAGA GCTTTAGACT TGCTAACCGC TGAAAGAGGG 90 60 

GGAACCTGTT TATTTTTAGG GGAAGAATGC TGTTATTATG TTAATCAATC CGGAATCGTC 912 0 

ACTGAGAAAG TTAAAGAAAT TCGAGATCGA ATACAACGTA GAGCAGAGGA GCTTCGAAAC 918 0 

ACTGGACCCT GGGGCCTCCT CAGCCAATGG ATGCCCTGGA TTCTCCCCTT CTTAGGACCT 924 0 

CTAGCAGCTA TAATATTGCT ACTCCTCTTT GGACCCTGTA TCTTTAACCT CCTTGTTAAC 9300 

TTTGTCTCTT CCAGAATCGA AGCTGTAAAA CTACAAATGG AGCCCAAGAT GCAGTCCAAG 9360 

ACTAAGATCT ACCGCAGACC CCTGGACCGG CCTGCTAGCC CACGATCTGA TGTTAATGAC 94 20 

AT C AAAGGC A CCCCTCCTGA GGAAATCTCA GCTGCACAAC CTCTACTACG CCCCAATTCA 94 80 

GCAGGAAGCA GTTAGAGCGG TCTCGGCCAA CCTCCCCAAC AGCACTTAGG TTTTCCTGTT 954 0 

GAGATGGGGG AC T GAG AG AC AGGACTAGCT GGATTTCCTA GGCTGACTAA GAATCCCTAA 9 600 

GCCTAGCTGG GAAGGTGACC ACATCCACCT TTAAACACGG GGCTTGCAAC TTAGCTCACA 9 660 

CCTGACCAAT CAGAGAGCTC AC T AAAAT GC TAATTAGGCA AAGACAGGAG GTAAAGAAAT 97 20 

AGCCAATCAT CTATTGCCTG AGAGCACAGC AGGAGGGACA ATGATCGGGA TATAAACCCA 97 80 

AGTCTTCGAG CCGGCAACGG CAACCCCCTT TGGGTCCCCT CCCTTTGTAT GGGAGCTCTG 98 4 0 

TTTTCATGCT ATTTCACTCT ATTAAATCTT GCAACTGCAC TCTTCTGGTC CATGTTTCTT 9900 

ACGGCTTGAG CTGAGCTTTC GCTCGCCATC CACCACTGCT GTTTGCCGCC ACCGCAGACC 9960 

CGCCGCTGAC TCCCATCCCT CTGGATCATG CAGGGTGTCC GCTGTGCTCC TGATCCAGCG 10020 

AGGCACCCAT TGCCGCTCCC AATCGGGCTA AAGGCTTGCC ATTGTTCCTG CATGGCTAAG 10080 

TGCCTGGGTT CATCCTAATT GAGCTGAACA CTAGTCACTG GGTTCCATGG TTCTCTTCTG 10140 

TGACCCACAG CTTCTAATAG AGCTATAACA CTCACCGCAT GGCCCAAGGT TCCATTCCTT 10200 

GAATCCATAA GGCCAAGAAC CCCAGGTCAG AGAACACGAG GCTTGCCACC ATCTTGGGAG 102 60 

CTCTGTGAGC AAGGACCCCC AAGTAACACA AC CAT GAG GG TGCAAATGCA TGGGCCACTA 10320 
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ATGGTAGAGC AAGAAAACAG AAGGGCCCTG GTTCCTCGAA GGCATCAGTG AGCTGAAATG 10380 
CCTGCCCTGG ATGTCCTATT CCTAGGTGTT TTTCTGCCTG AAGCAGATTA AACCCTTTGT 104 4 0 
TCACTTCTCC AAGTAGGGCT TCTATTACAG CCCAAATCAA TCCCCACCCC AGATGACAT 104 99 



(2) INFORMATIONS FOR SEQ ID NO: 4: HE2 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2784 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

C T C C T T C AG G AGAAC AAAG AAC AG G C C AC T AC C C AAG AG AAG AC T G GC AAC T AGAT T T T AC C C AT AT GC C C AAAT C T C AG 
GGATTTCAGTATCTACTAGTTTGGGTAGATACTTTCACTGGTTGGGCAGAGGCCTTCCCCTGTAGGACAGAAAAGGCCCA 
AGAGGTAATAAACGTTCATGAAATAATTCCCAGATTCGGACTTCCCCTVAGGCTTACAGAGTGACAATGGCCCTGCTTTCA 
AGGCTACAGTAACCCAAGGAGTATCCCAGGTGTTAGGTATACAATATCACTCACACTGCGCCTGGAGGCCACAGTCCTCA 
GG AAAG G T GGAGAAAAT G AAC AAAAC AC T C AAAT G AC AT C T AAAAAAG C T AAT C C AG GAAAC C C AC C T C G C AT G G C C T G C 
TCTGTTGCCTATAGCCTTACTAAGAATCCGAAACTCTCCCCAAAAAGCAGGACTTAGTCCATACAAAATGCTGTATGGAC 
GGCCCTTCCTAACCAATGAACTTGGGCTTGACCGAGAGACAGCCAACTTAGTTGCAGACATCATCTCCTTAGCCAAATAT 
CAACAGGTTCTTAAAACATTACAGGGAGCCTGTCCCCAAGAAGAGGG7UVAGGAACTATTCCACCCTGGTGACATGGTATT 
AGTCAAGTCCCTTCCCTCTAATTCCCCATCCCTAGATACATCCTGGGAAGGAAACTACCCAGCCATTTTATCTACCCTAA 
CGGCAGTTTVAAGTGGCTGGAGCGGAGTCTTGGATACATCACACTCAAGTCAAACCCTGGATACTGCCAAAGGAACTCAAA 
AATCCATGAGACAATGCTAGCTATTCCTGTGAACCTCTAGAGGATCTGCGCCTGCTCTTCAAATGACAACCAGGGGGAAA 
GTAACTAAAATCGTAAATCCCCTGGCCCTCCCTTATCATATTTTTCTCTTTACTGTTCTCTTACCCCCTTTCACTCTCAC 
TGCACCCCGTCCATGCCACTGCACCCCGTCCATGCCCCGTCCATGCCAGTAGCTCCCCTTAGCAAGAGTTTCTATGGAGA 
ATGCAGCGTCCCGGAAATATTGATGCCCCATTGTATAGGAGTTTATCTAAGGGAACCCCCACCTTCACTGCCCACACCCA 
TATGCCCCACAACTGCTATAACTCTGCCACTCTTTGCATGCATGCAAATACTCATTATTGGACAGGAAAAACGATTAATC 
CCAGTTGTCCTGGAGGACTTGGAGGACTCACTTCACTCATACCAGTATGTCTGATGGGGGTGGAGTTCAAGATCAGGCAA 
C AGAAAAAC AC AT AAAG G AAG T AAT C T C C C AAC T GAC C T G G G T AC AT AG C AC CCCTGGCCC C T AC AAAG G AC T AGAT C T C 
TCAAAACTACATGAAACCCTCCATACCCATACTGGCCTGGTAAGCCTATTTAATACCACCCTGACTGGGCTCCATGAGGT 
CTCGGCCCAAAACCCTACTAACTGTTGGATGTGCCTCCCCCTGCACTTTAGGCCATACATTTCAATCCCTATACCTGAAC 
AATGGAACAACTTCAGCACAGAAATAAACACCACTTCTGTTTTAGTAGGTCCTCTTTCCAATCTGGAAATAACCCATACC 
TCAAACCTCACCTGTGT7VAAATTTAGCAATACTATAGACACAGCCAACTCCCAATGCATCAGGTGGGTAACTCCTCCCAC 
ACGAATAGTCTGCCTACCCTCAGGAATATTTTTTGTCTGTGGTACCTCAGCCTATCATTGTTTGAATGGCTCTTCAGAAT 
CTGTGTGCTTCCTCTCATTCTTAGTGGCCCCTATGCCCATCTACACTGAACAAGATTTATACAATCATGTCATACCTAAG 
CCCCGCAACAAAAGAGTACCCATTCTTCCTTTTGTTATTGGAGCAGGAGTGCTAGGCGGAGTAGCTACTGGCATTGGCGG 
TATCACAACCTCTACTCAGTTCTACTACAAACTGTCTCAAGAACTAAATGGTGACATGGAATGGGTCGCTGATACCCTGG 
TCACCTTGCAAGATCAACTTAACTCCCTAGCAGCAGTAGTCCTTCAAAATCGAAGAGCTTTAGACTTGCTAACCGCGGAA 
AGCGGGGGAACCTTTTTATTTTTAGAGGAAAAATGCTGTTGTTATGTTAATCAATCCGGAATCATCACCGAGAAAGTTAA 
AGAAATTCAAGGTCGAATATAACGTAGAGCAAAGGAGCTGCAAAACACTGGACCCTGGGGCCTCCTCAGCCAATGGATGC 
CCTGGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGTTACTCCTCTTTGGACCCTGTATCTTTAACCTCCTT 
GTTAAGTTTGTCTTTTCCAGAATCG7VAGCAGTAAAACTACAAATCGTTCTTCAAATGGAGCCCCAGATGCAGTCCATGAG 
TAAAATCTACCACGGACCCCTGGACCGGCCTGCTAGCCCATGCTCTGATGTTAATGACATCAAAGGCACCCCTCCCGAGG 
AAATCTCAACTGCACAACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGTGGTTGTTGGCCAACCTCCCCAACA 
GCAGTTGGGTTTTCCTGTTGAGAGGGGGGACTGAGAGACAGGAATAACTAGATTTCCTAGACCAACTAAGAATCCCTAAG 
ACTAGCTGGGAAGGTGACCGCTTCCACCTTTAAACACCGGGCTTGCAACTTAGCTCACGCCCAACCAATCAGATACTAAA 
GAGAGCTCACTAAAATGCTAATTAGGCAAAAACAGGAGATAAAGAAATAGCCAATCATCTGTTG 



(2) INFORMATIONS FOR SEQ ID NO: 5: HE3 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1799 base pair 
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(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GGGATTCTTAGTCGGCCTAGGAAATCCAGCTAATCCTGTCTCTCAGTCCCCCCACTCAACAGGAAAACCCAAGTGCTGTT 
GGGGAGGTTGGCTGACGACCAGTCTAACTGCTTCCTGCGGAATTGGGGCATAGTAGGGGTTGTGCAGTTGAGATTTCCTC 
GGGAGGGGTGCGTTCGATATCATTACAATTGGAGCATGGGCTAGTAGGCCGGTCCAGGGGTCCACGGTAGATCTTAGTCA 
TGGACTTCATCTGGGGTTCCATTTGAAGAACGATTTGTAGCTTTACAACTTTGATTCTGGAAGAGACAAACTTAACAAGG 
AGGTTAAAGATACAGGGTCCAAAGAGGAGTATCAATATTAGAGCTGCTAGAGATCCTAAGAAGGGGAGAATCCAGGGCAT 
CCATTGGCTGAGGAGGCCCCAGGGTCTGGTGTTTTTGAAGCTCCTCTGTTCTACGTTGTATTCAATCTCGAATTTCTTCA 
ACTTTCTCTGTGACAATTCAGGATTGATTAACATAATAACAACATTCTTCCGCTAAAATAACATAATAACAACATTCTTC 
CCCTAAAAATAAACAGCTTCCCCCTCTTTCAGAGGTTAGCAAGTCTAAAGCTCTTCAATTTTGAAGGACTACTGATGCTA 
GG7^VGTTAAGTTGATCTTGCAAGGTGACCAGGGAGTCGGCAACCCATTCCATGTCACCATTGAGTTCTTGAGATAGTTTG 
TAGTAGAACTGAGTAGAGGTTGTGGTACCGCCAATGCCAGT^ACCTAGTCCACCTAGCACTCCTGCTCCGATAACAAAAGG 
AAGAATGAGTACTCTTTTGTTGTGGGGCTTAGGTACAACATAATTGTATAAATCTTGTTCAGTGTAAATGGTCATGGGGG 
C AC T AAG AAT G AG AGGAAG C AC AT AGAT T C T G AAG AG C CAT T C AAAC AAC GAT AG G C T AAG G T ACC AC AG AC AAAAAAT A 
TTCCTGAGGGTAGGCAGACTATTCGTGTGGGAGGAGTTACCCACCTGATGCATTGGGAGTTGGTTGTGTCTACAGTATTG 
CTAAATTTTACACAGGTGAGGTTTGAGGTATGGGTTATTTCCAGATTGGAAACAAGAGGTCCTACTAAAACGGAAGTGGT 
GTTTATTTCTGTGCTGTAGTTGTTCCATTGTTCAGGTACAGGGATTGAAATGCATGGCCTGAAATACAGGGGGAGGCACA 
ACCAACAGTTAGTAGGGTTTTGGACCGAGACCTCATGGAGCCCAGTGAGGGTGGTATT7VAATAGGCTTACCAGGCAAGTA 
TGGGTATGGAGGGTTTCATGTAGTTTTAAGAGATCTAGTCCTTTGTAGGGGCTAGGGGTGCTATGTACCCGGGTCAGTTG 
GGAGGTTACTTCCTTTACATGTTTTTCTCTTGCCTGATCTTGAACTCCACCCCCCTCAGACATACCAGTATGGGTGAAGT 
AAGTCCGACAGACAGTGGCTCCAAGTCTTCCAGGACAACTAGGATTAATCATTTTCCCTGTCCAATAATGAGTATTTGCA 
TGCATGCAAAGAGTGGCAGAGTTATAGCAGTTGTGGGGCATATGGGTGTGGGCAGTGAAGGTGGAGTTTCCTTTAGGTAA 
ACTCCTATTTGATGGGGCATCAATATTTCTGGGAAGCCGCATTCTTCATAGAAACTCTTGGTAAGGGGAGCTGCTGGTTG 
TACAGCAGCATGGAGGGGGTGCAGTGAGAGTGAAAGGGGGTAAGAGAACAGTAAAGAGAAAAATATGATAAGGGAGGGCC 
ATGGGGATTTACGATTTTAGTTACTTTCCTCACGGTTGT 



(2) INFORMATIONS FOR SEQ ID NO: 6: HG3 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1489 base pair 

(B) TYPE: nucleotide 
STRANDS NUMBER: single 
(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

TGGTGCTTGC CCCGGGCACT CTCAGTCCTG CTGCTGGATC ATCTGGTTAG TGGCTTCTGA 60 

CTCAGAGGAC CTACGTCCCC TGGGGCAGTG GGCCTTACAG TGATTCCCTT GACACGAGGT 120 

GCATGGACGA GGGGGCGGCT TATTTCTATT TGGACAATCT TTTTTAAAGT GTCCTTGTAG 180 

ACCGCACTGG AAG C AAAC C C TATTAGGCAT TTGATTTGCC TAGCTTTTCC CTTTTCCAGT 240 

GCCTCCAAAG TCCGCTTGCC TGAGGGCCAT GACTAAAGCG GTGGCCTTTT TTTTATCCCA 300 

TTTGTCCCAT TCTGCCTGCT CATCCTGATC TCTATTATAA AAAACTGAGG TTGCCAAGTT 3 60 



CAATAGGGTT TCTAAGTTTT GTTCCGGGCC TAAGGCAGAC TTTTGAAGTT TTTTCCTAAT 



420 
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GTCTGTAGCT GACTGAGTGA TAAACTTATC CTTTAAGATT AGTTGGCCTT- CAGTAGAGTC 4 80 

AGTTGACAGA GAGAGGTATG CTTCCTCAAT GCCTCCGTTA GTCACTCCAG AAAGGCGGTA 54 0 

GGATTTTCTT CCTTTCCCTG TGTTATAGTG GACATCATTG AATAACTCAC AGGCTTCTTT 600 

CTAGTTTTCC TTAGTCCTTC TAGCACGCAA GTTAGCAAAT GTCTGCGGCA CCAATCTCCA 660 

-TGTTCTGATT CTGTGTCCCA GTGAGGGTCT AC ACT GGG AA CTGCCTGCTG GCCTGTGGGG 720 

AATCGTTCTC TTTCCTCTGT TGTCGACCTA TCATTGACCT GACTGAGATA CCAGAGATCG 780 

CCAAACTCTC AGGCTGCAGT TACGGCGACA CTTCTGTCAT TTGGGGTTAG TGTCTGATTT 84 0 

AGCAGTAACA TTATATCTCT CCATATCAGA TCAAAGGATT GTCCTAAACC TTGTAAAACA 900 

TCAATATAGC CATTAGGGTT ATCTGAGAAT TTACCTAGGT CTATTTTAAT TTAAAGTCTG 9 60 

GGAGAGAAAA AGGCACATGC ACTCTGGCTG GGCCGAATTC TCTTCCTCCC ACTGCGTCTG 1020 

AGAGAGAAAA AGGTACGTGC ACTCTGGCTG GGCCGAATTC TCCTCCCACC GCTTGGAGGG 1080 

GGCATAATCG GGGAATATTG GCATTCTTTG GTTAGTTGTT TACCCCTTTG TCTATCTCCT 114 0 

TTTGGACCGT TTGGGTTGAA GGGGGGTCCT TATTATTTGG GGAAGGAGTC TGGGGGATGC 1200 

TGGGGTAGGG AGGTAGACTC TGAGGGCTTC CTGTAGGGCA T AAAT C AC AC TTTTTACATA 12 60 

ATTGCGAGTT GTCTCTTAAT GAAAAGAAAG TTTGTACGTA TGACACTTCA CACCATTTGC 1320 

CTTCTTTTCT ACAAAAGAGG TCTAGCTGTA AGATGGTGTT ATAATTTATG CTTCCCTCAG 138 0 

GATGCCAGGT TTCTCCCCCT TAAAGAGTAT ATCGTTGCCA GGCGGTACTG CAGAAGAATA 14 4 0 

TGTCTTTTTT TTCTTAGCAT CTGAGAGTCA AATTGGTCCC AATTCTCCA 14 8 9 

(2) INFORMATIONS FOR SEQ ID NO: 7: HE4 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1216 base pair 
<B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 7: 

TAAAGATACA GGG AT T G AAA TGTATGGCCT GAAGTGCAGG GTCATATAGG TGTGGGTGGT 60 

GAAAATGGGG TTTCCTTTAG AAAAACTCCT ATACGATGGG TCATCAATAT TTCCAGGAAG 120 

CCGCATTCTC CATAGAAGCT CTTGGTAATG GGAGCTACTG GTAGTACAGT GGCATGGAGG 18 0 

GGGTGCAGTG AG AG T G AAAG AGGGTAAAAG AACAGTAAAG AGAAAAATAT GATAAGGGAG 24 0 

GGGTTCAGTG AGAGTGAAAG GGGGTAAGAG AAC AG T AAAG AAAAAAATAT GACAAGGAGG 300 

GCCATGAGGA TCTACGATTC TAGTTACTTT CCTCACGGTT GTCGCTTGAA GAGCAGGTGC 3 60 
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AGATCCTCTA GAGGTTCACA GGAATAGCTA GCGTTGTCTC CTGGATTTTC GGGTTCCTTT 42 0 

GGCAGTATAC AGAGTTTGAC TCGAGTGTGA TGTATTCAAG ACTCCACTCC AGCCACTTTA 4 80 

ACCGCAGTTG GGGTAGATAA AATGACTGGG TAGGGTCCTT CCCAGGATGT ATCTAAGGAT 54 0 

GGGGACTTAG AAGGAAGGGA CTTGACTAAT ACCATGTCAC CAGGGTGCAA TAATTACTTT 600 

CCCTCTTCTC GGGAACAGGT TCCCTGTAAT GTTTTAAGAA CTTGTTGATA TTTGGCCAAG 660 

GAGGTGATGT CTGCAACTAA GCTGGCCATC TCTCGGTCAA GCACAAGGTC CTTGGTTAGG 720 

AAGGGCCATC CATACAGCAT TTTGTATGGG CTAAGTCCTG CTTTTTGGGG AGAGTTTTGG 7 80 

ATTCTTAGTA AGGCTGTAGG CAACAGAGCA GGCCATGCAA GGTGGGTTTC TTGGGTTAGC 84 0 

TTTTTTAAAT GTCGTTTGAG TGCTTCATTC ATTTTCTTGA CTTTTCCTGA GGATTGTGGC 900 

CTCCACGCGC AGTGTAAGTG- ATATTGTATG CCTAATGCCT GGGATACTCC CTGGGTTACT 9 60 

GTAGCCTTGA AAACGGGGCC ATTGTCACTC TGTAAGCCTC GGGGAAGTCC GAATCTGGGA 1020 

ATTATTTCAT GAATTAGTGC CTTTATTACA TCTTGGTCCT TTTCTGTCCT ACAAAGGAAG 108 0 

GCCTCTGCCC AACCAGTGAA AATATCTACC C AG AC TAG T A GAT AC T G AAA TCCCTGAGAT 114 0 

TTGGGCATGT GGGTAAAATC TAGTTGCCAG TCTTCTCCTG AGTAATGGCC TGTTCTTTGT 1200 

TCTCCTGAAG GAGCTT 1216 
(2) INFORMATIONS FOR SEQ ID NO: 8: HE5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 976 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AGTGATAATG GAATACTTGA AAGTAATCCC CTCACTCCAG GAACTAGTGC TGAGCTGGCC 60 

AAACTAATAG CCCTCACTCG GGCACTAGAA TTAGGAGAAG AG AAAAG G G T AAATATATAT 120 

ACAGACTATA AGTATGCTTA CCTAGTCCTT CATGCCCATG CAGCAATATG GAGAGAAAGG 18 0 

GAATTCCTAA CTTCCAAAGG AACACCTATC AAACATCAGG AAGCCATTAG GATATTATTA 24 0 

TTGGTGGTAC AGAAACCTAA AGAGGTGGCA GTCCTACACT GCTGGGGTCA TCAGAAAAAA 300 

AAGGAAAGGG AAATAGAAGG GAACTACCAA GCAGATATTG AAGCC AAAAG AGCCGCAAGG 360 

CAGGACCCTC CAT T AGAAAT GCTTATAGAA GGACCCCTAG TGTGGGGTAA CCCCCTCCAG 4 20 

GAAAGCAATC CCCAGTACTC AGCAGGAGAA ATAAAATGGA GAACCTCACG AGGACATACT 4 80 

TTCCTCCCCT CAGGATGGCT AG C C AC C AAA GAAGGAAAAA TGCTTTTGCC TGCAGCTAAC 54 0 
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CAATGGAAAT TACTTAAAAC CCTTCACCAA ACCTTTCACT TAGGATTGAT AGCACCCATC 600 

AGATGGCCAA ATTATTATTT ACTGGATCAG GCCTTTTCAA AACTATCAAG CAGGTAGTCA 660 

GGGCCTGTAA AGTGTGCCAA AGAAATAATC TCCTGCACTG CAAGCCATAC ATTTCAATCC 720 

CTGTATCTTT AACCTCCTTG TTAAGTTTGT CTCTTCCAGA ATCAAAGCTG T AAAAC T AC A 780 

AATGGTTCTT CAAATGGAGT CTCAGATGCA GTCCATGACT AAG AT AT AC C GCAGCCCCCT 84 0 

GGAGGGGGCC TGCTAGCCCA TGCTCCAATG TTAATGACAT CGAAGGCACC CCTCCCGGGG 900 

AAAT C T C AAC TGCACAACCC CTACTATGTC CCAATTCAGC AGGAAGCAGT TAAAGCGGTC 960 

ATCGGCCAAC CTCCCC 97 6 
(2) INFORMATIONS FOR SEQ ID NO: 9: HE6 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 942 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

AGAGGAGAAC AGCAGCATAA GCGGCTGGCA GAGGTAGGGA AAG AC C AG C A AGAAGAAAAG 60 

AGAGAAAGAG AAAGAGAAAG T C AG AG AAAG AGACAGAGAG AGGAAGAGAC AAAGAGACAG 120 

AAAGTCAAAG AGGTAGTAGT CAGAAACAGA GACAAAAAAA AGGAGTCAGA AAGAGGGACA 180 

GACACAGAAA GTCAAAAAAA AAGTTAAGAA GAAAGGAAAA GACAAAGAAG AAG T C G AAG A 24 0 

G GAG AAAG AG AGAGATAGAA GTAGTAAAGA AAAAAACAGC ATATCCCATT CCTTTAAAGC 300 

CAGGGTAAAT TTCTATCTAC CCAGCCAAGG CATATTCTAC TTATGTGGAT CTTCAACCCA 360 

TATCTGCCTC TCAGACAGTT TGCAAGAAAT AATGAAATCT ATCCTTACTT TACAATCCCA 4 20 

AATAGACTCT TTGGCAGCAG TGACTCTCCA AAACTGCAGA GGCCTAGACC TCCTCACTGC 4 80 

TGAAAAAGGA GG AC AC T ACA CCTTCTTAGG GGAAGAATGT TGTTTTTACA CTAACCAGTC 54 0 

GGGGATAGTA TGAGATGCTG CCCGGAGTTT ACAGGAAAAG GCTTCTGAAA TCAGACAACG 600 

CCTTTCAAAT TCTTATACCA ACTTCTGGAG TTAGGCAACA TGGCTTCTCC CCTTTCTAGG 6 60 

TCCTGTGGCA GCCATCTTGC TGTTACTCGC CTTTGGGCCC TGTATTTTTA ACCTTCTTGT 720 

CAAATTTGTT TCCTCTAGAA TCGAGGCCAT CAAGCTACAG ATGGTCTTAC AAATGGAACC 78 0 

CCAAAAGAGT T C AAC T AAC A ACTTCTACCG AGGACCCCTG GATCAACCCA CTGGCACTTC 84 0 

CCCTGGCCTA GAGAGTTCCC CTCTGAAGGA CACCGCAACT GCAGGGCCCT TCTTTGCCCC 900 

ATCCAGCAGG AGTAGCTAGA GTGGTCATCG GCCAAATTGC CA 94 2 
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(2) INFORMATIONS FOR SEQ ID NO: 10: HG6 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1375 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CCCCAATATT CTCTTTCTGA TGGGGAAAAA TGGCCACCTG AGGGAAGCAC AAATTACAAT 60 

ACTATCCTGC AGCTTGATCT TTTCTGTAAG AGGGAAGGCA AATGGAGTGA AATACCTTAT 12 0 

GTCCAAGCTT TCTTTTCATT GAGGGAGAAT ACACAACTAT GCAAAGCTTG CAATTTACAT 180 

CCCACAGGAG GACCCCTCAG CTTACCCCCA TATCCTAGCC TCCCTATAGC TTCCCTTCCT 24 0 

ATTGATGATA CTCCTCCTCT AATCTCCCCT GCCCAGAAGG AAATAAGCAA AGAAATCTCC 300 

AAAGGTCCAC AAAAACCCCC GGGCTATCGG TTATGTCCCC TTCAAGCTGT AGGGGGAGGG 360 

GAATTTGGCC CAACCCGGGT GCATGTCCCC TTCTCCCTCT CTGATTTAAA GCAGATCAGG 4 20 

CAGACCTGGG GAAGTTTTCA GATGATCCTG ATAGGTACAT AGATGTCCTA CAGGGTCTAG 4 80 

GGCAAACCTT TGACCTCACT TGGAGAGACG TCATGCTACT GTTAGATCAA ACCCTGGCCT 54 0 

TTAATGAAAA GAATGCGGCT TTAGCTGCAG CCTGAGAGTT TGGAGATACC TGGTATCCTA 600 

GTCAAGTAAA TGAAAGAATG ACAGCCGAAG AAAGGGACAA CTTCCCTACT GGTCAGCAAG 660 

CCATCCCCAG TATGGATCCC CACTGGGACT TTGACTCAGA TCATGGGGAC TGGAGTCGTA 720 

AACATCTGTT GATCTGTGTT CTGGAAGGAC TAAGGAGAAT TGGGAAAAAG CCCATGAATT 780 

ATTCAATGAT ATCCACCATA AC C C AG G G AA AGGAAGAAAA TCCTTCTGCC TTCCTCGAGC 84 0 

GGCTACAAGA GGCCTTAAGA AAATATACTC CCCTGTCACC CGAATCACTC GAGGGTCAAT 900 

TGATTCTAAA AGATAAGTTT ATTACCCAAT CAGCCACAGA TATCAGGAGA AAGCTCCAAA 960 

AGCAAGCCCT GAGCCCTGAA CAAAATCTAG AGACATTATT AAACCTGGCA ACCTTGGTGT 1020 

TCTATAATAG GGACCAAGAG GAACAGGCCC AAAAGGAAAA GCGAGATCAG AGAAAGGCCG 1080 

CAGCCTTAGT CATGGCCCTC AGACAAACAA ACCTTGGTGG TTCAGAGAGG TCAGAAAATG 114 0 

GAGCAGGCCA ATCACCTGGT ACGGCTTGTT ATCAGTGCGG TTTACTAGGA CACTTTAAAA 12 00 

AAGATTGTCC AATAAGAAAC AAGCTGCCCC CTCATCCGTG TCCACTATGC CGAGGCAATC 12 60 

ACTGGAAGGT GCACTGCCCC AG AG GAT G AA GGTTCCCTGG GTTAGAAGCC CCCAACCAGA 1320 

TGATCCAACA ACAGGACTGA GGGTGCCCGG GGCAAGCACC AGCTCATGTC ATCAC 1375 
(2) INFORMATIONS FOR SEQ ID NO: 11: HE7 



1) 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 944 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ACCTAGGAGG AACTGTCTTC AG G AC AGGAC TATAGATGCT TCCTCCCAGG CGATTAAGGG 60 

AAAAAGACAC AATGGGTATT CAGTAAGTGA TAAGGAAACT CTTGTAGAAG CAGAGTTAGG 120 

AAAATTGCCT AATAATTGGT CTGCTCAAAT GTGCGAGCTG TTTGCACTCA GCCAAACCTT 180 

AAAAGTATTA CAGAATCAGG AAG AAG C CAT C T AT ACCAAT TCTAAGTTAA TATGGACTGA 24 0 

ACGAGAACTT ATTAATAGCA AAG AAT AAT T GAAATCCCAA ACTTACAAGG TTTTCAACAA 300 

AAGCACAGTT TGCTAAAAGT TAACTGTGTA ACATGTATTA TCCTACTACC ACAAACTCTC 3 60 

AAATGATTTC TCAGACAGTT TGCAAGAAAC AATGAAACCT ATCCTTACTC TACAATCCCA 4 20 

AAT AG ACT CT TTGGCAGCAG TGACTCTCCA AAACCACCAA GGCCTAGACC TCCTCACTGC 480 

TGAGAAAGGA GGACTCTGCA CCTTCTTAGG GGAAGATTGT TGTTTTTACA CTAACCAGTC 54 0 

AGGGATAGTG TGAGATGCCA CCCAGCGTTT AC AG G AAAAG GCTTCTGAAA T C AG AC AC AA 600 

TGCTTTTCAA ACCTTATAGC AACCTCTGGA GTTCGGCGAC TGGCTTTTCC CCTTTCTAGG 660 

TCCTGTGACA GCCATCTTGC TATTACTCGC CTTCGGGCCC TGTATTTTTA ACCTCCTCGT 720 

CAAATTTGTT TCCTCTAGGA TCGAGGCCAT CAAGCTACAG ATGGTCTTAC AAATGGAACC 780 

CCAAATGAGC TCGACTAACA ACTTCTACTG AGGACCCCTG GACCGACCCA CTGGCCCTTT 84 0 

AACTGGCTTA AAGAGTTTCC CTCTGGAGGA CACTACAACT GCAGGGCCCC TTCTTTGCCC 900 

CATCCACAGG AAGTTAGCTA GAGCAGTCAT CACCCAATTC CCAA 94 4 
(2) INFORMATIONS FOR SEQ ID NO: 12: HE8 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 963 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TACAGGAACC CCATAATACG TCCTTGGCAA ATTCTATTCA GCTCCAACTG CTAGGAGTGG 60 
CCCATTTGTC CTGAACCCTC AAATCATGGG AAT G AG AAAT GAATTTAGAC TGACCACAGC 120 
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CCTTATGAGT TTTCAGCTAC AGGGGTGTAT AGAACCCTGA TAAGGAGTTT TCTTTGTGTG 18 0 

TGGAAGATCC TTCTATATTT GCCTCCCCAC CAACTGGACA GGAACTTGTA CTTTAGCCTA 24 0 

CATAGTACCT CCTGTGACTT ATCCTTTTCA GAAGAGGCAG TAGCTGTGCC CATTCATGCT 300 

AAGCTTCAGC CGAGAGCAAT CTCACTACTT CCTCTATTGG CTGGTTTAGG ATTTACTACC 3 60 

ACCTAGGAAG TGGACTCACA GCCTAGATGA AATCTCTCTC CAACTTACTC AAATCCAGGA 4 20 

CCAAATAGAC TCATTAGCAG CTGTGGTTCT CCGAACCAGT GAGCACTAGA TCTCCAATCT 4 80 

CCTCACTGCC GAAAGGGGAG GAACATGCCT TTTTCTGAAC AAGGAATGTT GTTTTTATGT 54 0 

CAATAAATCA GGCATAGTGA GAGATGGAAT TAAATGACTT CAGGATAGAG CTAGCAGACT 60 0 

ACATGGTGGG ACAACCGAAA CTACCTCAGG GTTCTCACAG CCTGTTCTCC ACTGGCTTCT 660 

TCCATTTTTA GGTCCCTTCC T TAT GAT TAT TCTAGGAGTA ACCTTTGGCC CATGTCTTTT 72 0 

CAGTTCCTTC ATCCTTTCGT TTCTTCCTGA ATAGAATCAA TGAAACTAGA AATGTTACTG 7 80 

CAGATGGAAC CTCAGATGAC TTCAACCAGC ACCTATTATC AAGGACCCCT AAACCAGCCT 84 0 

GCCGGCCCAT ACCCGGACGT TGACACCCAA ACCACCTCTC AC GAG G AAAC C T C AG C T AC A 900 

GAACCCCTTC TATGCCCCTA TTCAGCAGGA AGCAATTAGA GTGGTCATCC TCCCACACCC 960 

CAA 963 
(2) INFORMATIONS FOR SEQ ID NO: 13: HG8 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1362 base pair 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CCACAATATC CTCTTCCAGG AGGAGAACGA TGGCCACCTG AGGGAAGTAT ACACTATAAT 60 

ACCATCCTGC AACTAGATCT GTTTTGTAAA CAAGAAGGCA AGTGGATTTA GGTACCATAT 120 

GTTCAGACCT TTTTCTCATT AAGGGATGAT AACCCACGAT TGTGTAAGAC ATGTAACCTG 180 

CACCCCACAG GGAGTCCTCA AATTCTACCC CCATACCCAG TCCTCCCCAC GGCTCCTCCT 2 40 

ACTAATGCCA AACCCTCTCT GGCTTCTACA GCCCAAAAGG GAACAAATAA AAGAGCCTTC 300 

AGAGAGCCAA GAGACCCCAC TGGCCCCTGG CTATGTCCTC TTCAGGCTGT AGGAGGGGAA 3 60 

TTTGGCCCAA CCCGAGTACA TGTTCCCTTT TCTCTCTCTG ATCTAAAGCA AAT TAAGGCA 4 20 

GACTTGGATG AAAGTTCTCA GATGACCCCA ATAGATACGT AGATGGCCTG CTGGGTCTGG 4 80 

GACAATCTTT TGACCTTTCC T GG AG AG AG A TCATGTTATT GCTTGATCAG ACCTAACCTC 54 0 
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TAATGAGAAG AATGCTGCTT TAACAGGAGC CCGAGAGTTT GGGGATACCT GGTACCTCAG 600 

TTAAGTAAGT GATAGAATGA CATCAGAAGA GAGCAGTTTC CTACTGGCCA GCAAGCAGTC 660 

CCCAGTATGG ATCCCCACTG GGACCCTGAC TCGGATCATG GGGACTGGAG T C AC AAAC AT 720 

TTACTGACCT GTATCCTAGA AGGGTTAAGG AGAACTAGGA AAAAGCCCAT GAACT AT TCA 7 80 

ATGATGTCTA CTATAACCCA AGGGAAGGAA GAAAACCCTA TTGCCTTCCT CAAAAGGCTG 84 0 

AGGGAGGCTT TGAGAAAATA TACTCCCCTG TCACCAGATT CCCTCGAAGG CCAGTTAATT 900 

TTAAAGGACA AATTTATTAC TCAGTCAGCT GC AG AC ATT A GGAAAAAGCT CCAAAAGTTA 960 

GCCTTGGGCC GAGCAAAATT TGGAGGCATC ATTAAACCTG GCAACCTCAG TGTTCTATCA 1020 

TAGGGACCAA GAGGAACAGG CCGAAAAGGA AAAG C AG GAT AAGAGAAAGG CTGCAGATTT 108 0 

AGTCATGCCC TCAGACAAAC CTTGGCGGTT CAAAGAGGAG AAAAAATGGA GCAGGCCAAT 114 0 

CACCCAGCAG GGCTTATTAT CAGTGCAGTT TACAAGGACA CTTTAAACAA GATTGTCCAA 12 00 

AGAGAAATAA GCCGCCCTCT CACCCATGTC CACTATGCCA AGGTGATCAC TGGAAGGCAC 12 60 

ACTGTCCCAG AGGACAAAGG TTCTCTGGGC CAGAAGTCCC CAACCAGATG AT C C AG C AAC 1320 

AGGATGGAGG GTGCCCGGGG CAAGCACCAG CTCGTGTTGT CA 13 62 
(2) INFORMATIONS FOR SEQ ID NO: 14: HE9 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 945 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 14: 

TTGCAGATCA ATCTCAGACT GCTGTGCTAG CAATGAGTGA GGCTTCGTGG GCATGGGACC 60 

CTCTGAGCCA GGCATGGGAT ATAATGTCCT TGTGTGCCAT TTGCTAAGAC TGTTGGAATA 120 

GCACAGTATT AGGGTGGGAG TGGCCCGATT TTCCAGGTGC TGTCTGTCAC CGCTTCCCTT 18 0 

GGCTAGGAAA GAGAATTCCC TGACCCCTTG TTCTTCCCAG GTAAGGCAGT GCCTCACCCT 24 0 

GCTTCAGCTC ACACTCAGGT GACTGCACCC ACTGTCCTGC CCCCACTGTC GGACAAGCCC 300 

CAGTGAGATG AACCTGGTAC CTCAGTTGGA AAT GC AG AAA TCACCTGTCT TCTGCGTCAC 360 

TCACACTGGG AGCTGTAGAC TGGAGCTGTT CCTATTTGGC CATCTTGGAA CCATCTCCCA 420 

AATAGACTCT TTGGCAGCAG TGACTCTCCA AAACCACCAA GGCCTAGACC TCCTCATTGC 480 

T GAG AAAG G A GGACTCTGCA CCTTCTTAGG GGAGGAGTGT TGTTTTTATA CTGACCAGTC 54 0 

AGGGATGGTA CGAGATGCCA CCCGATGTTT AC AG G AAAAG GCTTCTGAAA T C AC AC AAC A 600 
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CCTTTCAAAC TCTTATACCA ACCTCTGGAG TTGGGCAACA TGGCTTCTCC CCTTTCTCGG 



660 



TCCCATTGCA GCCATCTTGC TATTACTCGC CTTCAGGCTG TGTATTTTTA ACCTCCTTGT 



720 



CAAATTTGTT TCCTCTAGAA TTGAGGCCGT CAAGCTACAG ATGGTCTTAC AAATGGGACC 



780 



CCAAATGAGC TCAACTAACA ACTTCTGCCA AGGACCCCTG GACCAACCTG CTGGCCCTTT 



840 



CACTGGCCTT AAGAGTTCCC CTCTGGAGGG CACTACAACT GCAGGGCCCC TTCTTTGCCC 



900 



CTATCCAGCA GGAAGTAGCT AGAGCAGTCA TCACCCAATT CCCAA 



945 



(2) INFORMATIONS FOR SEQ ID NO: 15: HE10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 939 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 15: 

AGAGCTACCT TGGCAAGTAC TCTAGGAGTA TGGGAAAATG AAAACAACAA ACTCACACAC 60 

CATTTTAACA TACACAATCA GGTCTGCCCA CCCAGCAAGG TATATTCTTT GTATGTGGAA 120 

CATCGACCTA TATCTGCCTC CCCACTAACT AGACAGCCAC CTGAATCTTA GTCTTTCTAA 180 

GTCCCAACAG TAACATTGCC CCAGGAAATC AGACCATATC AGTATCCCTC AAAGCTCAAG 24 0 

TCTGTCAGTG CAGAGCCATA CAACTAATAC CCCTACTTAT AGGGTAAGGA ATGGCTACTG 300 

CTACAGGAAC CAGAATAGCT AGTTTGTTTA CTTCATTATC CTACTACCAC ACACTCTCAA 3 60 

ATGATTTCTC AGACAGTTTG CAAGAAATAA CGAAATCTAT CCTTACTCTA CAATCCCAAA 4 20 

TAGACTCCTT GGCAGCAGTG ACCCTCCAAA ACGGCTGAGG CCTAGACCTC CTCACTGCCA 4 80 

AGAAAGGAGG ACTCTGCATT TTCTTAGGGG AAGAGTGTTT TTACACTAAC CAGTCAGGGA 54 0 

CAGTATGAGA TGCCACTCGG AGTTTACAGG AAAAGGCTTC TGAAGTCAGA CAATGCCTTT 600 

CAAACTCTAT ACCAAACTCT GGAGTTGGGC AACATGGCTT CTCCCCTTTC TAGGTCCCGT 660 

GACAGCCATC TTGCTATTAT TTGCCTTTGA GCCCTGTATT TTTAATCTCC TTTTCAAATT 720 

TGTTTCCTCT GGATCGAGGC CATCGAGCTA CAGATGGTCT TCACAAATGG AACCCCAAAT 7 80 

GAGCTCAACT AACAACTTCT ACTGAGGACC CCTGGACTAA CCTGCTGACC CTTTCACTGG 84 0 

CCTGAAGAAT TCCCCTCTGG AGGACACTAC AACTGCAGGG CTCCTTCTTT GCCCCTATCC 900 

AGCAGGAAGT AGCTAGAGCT GTCATTGCCT AATTCCTAA 93 9 

(2) INFORMATIONS FOR SEQ ID NO: 16: HE11 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 979 base pairs 
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(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

AGTGATAATG GAATACTTGA AAGTAATCCC CTCACTCCCC AGGAACTAGT GCTCAGCTGG 60 

CAGAACTAAT AGCCCTCACT CGGGTACTAG AAT C AG GAGA AGGAAAAAGG GTAAATATAT 12 0 

ATACAGACTC TAAGTGTGCT TACCTAGTCC TCCATGCCCA TGCAGCAATA TGGAGAGAAA 180 

GGGAATTCCT AACTTCCGAG GGAACACCTA TCAAACATCA GGAAGCCATT AGGAAATTAT 24 0 

TATTGGCTGT AC AG AAAC C T AAAGAGGTGG CAGTTTTACA CTGCCGGGGT C AT C AG AAAG 300 

GAAAGGAAAG GGAAATACAA GGGAGCCACC AAGTTGATAT TGAAGTCAAA AGAGCCACAA 360 

GGCTGGACCC TCCATTAGAA ATGCTTATAG GAGGACCCCT AGTATGGGGT AATCCCCTCC 4 20 

GGGAAGCCAA GCCCCAGTAC TCAGCAGGAG AAATAGAATA GGGAACTTCA TGAGGACATA 48 0 

CTTCCCTCCC CTCCAGATGG CTAGCCACCA ATAAAGGAAA AATACTTTTG CCTGCAGCTA 54 0 

AC C AAT AG AA ATTACTTAAA ACCCTTCATC AAACCTTCCA CTTAGGCATT GATAGCACCC 600 

ATGAGATGGC CAAATTATTA TTTACTGGAC CAGGCCTTTT CAAAACTATC AAGCAGATAG 6 60 

TCAGGGCCTG TAAAGTCTGC CAAAGAAATA ATCCCCTGCA CTGCAGGCCA T AC AT T T C AA 72 0 

TCCCTGTATC TTTAACCTCC TTCTTAAATT TGTCTCTTCC AGAAT C AAAG CTGTAAAATT 780 

ACAAATAGTT CTTCAAATGG AGCCACAGAT GCAGTCCATG ACTAAGATCC ACCACAGACC 84 0 

CCTGGACCAG CCTGCTAGCC CATGCTCCAA TGTTAATGAC ATCGAAGGCA CCCCCTCCTG 900 

AGGAAATCTC AACTGCACAA CCCCTACTAC GCCCCAATTC AGCAGAAAGC AGTTAGAGTG 960 

GTCATCAGCC AACCTCCCC 97 9 
(2) INFORMATIONS FOR SEQ ID NO: 17: HG11 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1774 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CATGCTGGTAAAGGACCGCTAGAATCCAGCAGCCAGGACCACTTTCTTTGTGGTCAAGAAAGGTGGGAAAACAG 

GTGCAGGACTGCTACACTGGTAAGCATAACTAATCCGATAAGCAGAGGTCCATGGGTGGTTACGCACCCTGGAAAGGAAT 

AAGCATTAGGACTATAGAGGACACTCTAGGACTAATGCTCATCGGAAAATGACTAGGGGTACTGGCATCCCTATGTTCTT 

TTTTCAGATGGGAAATGTTCCCCCCAAGGCAGAAATGCCCCTAAGATGTATTCTGGAGAAATGGGACCAATCTGACCATC 

AGACACTAAGAAAGAAATGACTTATATTCTTCTGCAGTACCACCTGGCCACAATATCTTCTTCAAGGGGCAGAAACCTGG 
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CCTCCTGAGGGAAGTATAAATTATAACACCATCTTACAGCTAGACCTCTTTTGTAGAAAAGAAGGCAAATGGAGTGAAGT 
GCCATATGTACAAACTTTCTTTTCATTAAGAGATAACTCCCAATTATGTAAAAAGTGTGATTTATGCCCTACAGGAAGCC 
CTCAGAGTCTACCTCCCGACCCCAGCAAGACCCCAACTCCTTCTCCAACTAATAAGGACCCCCCTTCAACCCAAATGGTC 
C AAAAGG AGAT AG AC AAAG G G G T AAAC AAT G AAC C AAAG AG T G C C AAT AT TAG AC GAT TAT AC T C G C T C C AAG C AG T G G G 
AGGAGAATTTGGCCCAGCCAGCGTGCATGTACCTTTTTCTCTCTCAGATTTAAAGCAAATTT^AAATAGACCTAGGTAAAT 
TCTCAGATAACCCTGATGGCTATATTGATGTTTTACAAGGGTTAGGACAATCCTTTGATCTGACATGGAGAGATATAATG 
TTACTGCTAAATCAGACACTAACCCCAAATGAAAAAAGTGCTGCCATAACAGCAGCCTGAGAGTTTGGCGAACTCTGGTA 
TCTCAGTCAGGTCAATGATAGGATGACAACAGATGAAAGAGAATGATTCCCCACAGGCCAGCAGGCAGTTCCCAGTGTAG 
ACCCTCATTAGGACACAGAATCAGAACTTGGAGATTGGTGCCACAGACATTTGCTAACTTGCGTGCTAGAAGGACTAAGG 
AAAACTAGGAAGAAGCCCATGAATTATTCAATGATGTCCCCTATAACACAGGGAAAGGAAGAAAATCCTACTGCCTTTCT 
GGAGAGACTAAGGGAAGGATTGAGGAAGCATACCTCCCTGTCACCTGACTCTATTAAAGGCCAACTAATCTTAAAGGATA 
AG T T TAT C AC T C AG T C AGC T G C AG AG AT T AAG AAAAAAC T T C AAAAG TAT G C C T T AG GC C C AG AG C AAAAC T TAG AAAC C 
CTACTGAACTTGGCAACCTCAGTTTTTTATAATAGAGATCAGGAAGAGCAGGGGAATGGGACAAATGGGATAAAAAAAAA 
AAAAAAAGGTGACTGCTTTAGTCGTGGCCCTCAGGCAAATGGACTTTGGAGGCTCCAGAAAAGGGAAAAGCTGAGCAAAT 
TGAATGCCTAACAGGGCTTGCTTCTAGTGTGGTCTACAAGGACACTTTAAAAAAGATTGTCCAAGTAGAAACAAGCTGCC 
CCCTTGTCCATGCCCCTTATGTCAAGGGAATCACTGGAAGGCCCACTGCCCCAGGAGATGAAGGTCCTCTGAGTCAGAAG 
CCACTAACCAGATAATCCAGCAGCAGGACTGAGGATGCCCAGGGCAAGCGCCAGCCCATGCCATCACCCTCACAGAGCCT 
TGGGTATGCTTGACCATTGA 

(2) INFORMATIONS for SEQ ID NO: 18: HE12 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 938 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TGTAGGAAGA ACTCCCTTCA GGACAGGACA ATAGATGGTT CCTCCCAGGT GATTAAGGAA 60 

AAAAGACACA GTATTCAGTA AG T GAT AAG G AAACTCTTGT AGAAGCAGAG TTAGAAAAAT 120 

TGCCTAATAA TTGGTCTGCT CAAATGTGTG AGTTGTTTGC ACTCAGCCAA ATCTTAAAGT 180 

AC T T AC AG AA TCAGGAAGCA GCCATCTATA CCAATTCTAA GTTAATATGG ACTAAACGAG 24 0 

GTTTTATTAG TAGCAAAGAA AAATTAAAAT CCCAAACTTA CAAGGTTTTC AACTAAAGTT 300 

TGCCAAAAGT TAACAGTGTA ACATGTATTA TCCTACTATC ACACACTCTC AAAGGATTTC 3 60 

TCAGACAGTT T GC AAGAAAT AACGTAATCT ATCCTTACTC TACAGTCCCA AATAGACTCT 4 20 

TTGGTAGCAG TGACTCTCCA AAACTGCCGA GGTCTAGACC TCCTCAATGC T GAG AAAG G A 4 80 

GAACTCTGCA CCTTCTTAGG GGAAGAGTGC TGTTTTTACA CTAACCAGTC AG G GAT AG T A 54 0 

TGAGATACTG CCTGACGTTT AC AG G AAAAG GCTTCTGAAA TCAGACAACG CCTTTCAAGC 600 

TCTTATACCA ACCTCTGGAG TTGGGCAACA TGGCTTCTCC CCTTGCTAGG TCCTGTGGCA 660 

GCCATCTTGC TATTACTTGC CTTCGGGCCC TGTATTTTTA ACCTCCTTGT CAAATTTGTT 720 

TCCTCTAGGA TCAAGGCCAT C AAG C T AC AG ATGGTCTTAC AAATGGAACC CCAAATGAGC 780 

TCAACTAACA ACTTCTACTG AGGACACCTG GACTGACCCA CTGGCCCTTT CACTGGCCTA 84 0 

AAGAGTTCCC TTCTGGAGGA CACTACAACT GCAGGGCCCC GTCTTCACCC CTATCCAGCA 900 
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GGAAGTAGCT AG AT C AG T C A TTGCCCAATT CCCAACAG 938 
(2) INFORMATIONS FOR SEQ ID NO: 19: HG12 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1308 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER : single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GATGCTTGCC CCAGGCACCC TCAGTCCTGT TGTTGGATCA TCTGGTCGGG GGCTTCTGGC 60 

CCAAAGAACC TTTGTCCTCT GAGGCAGTGC ACCTTCCAGT GATTGCCTCA GCATTGTGGA 120 

CATGGGCAAG GGGGCAGCTT GTTTCTCACT GGACAATCTT TTTTAAGGTG TCCTTCCAAA 18 0 

CCACACTGGT AACAAGCCCT ACCAGGTGAT TGGCCTGCTC TATTTTCTGT CCTCTCTGAA 24 0 

CCACCAAGGT TTGTCTGTCT GAGGGTCATG ACTAAGGCTG TGGCCTTTCT CTGATCTTGC 300 

TTTTCCTTTT TGGCCTGTTC CTCTTGGTAC CTATTATAGA ACACTGAGGT TGCCAGGTTT 360 

AACAATGGCT CCAGATTTTG TTCAGGGCAC AGGGCTCATT TTGGAGCTTT CTCCTGATAT 4 20 

CTGCAGCTGA TTGGGTAATA AACTTATCTT TTAGGATCAA TTGACTCTCA AGAGAGTTGG 480 

GTGACAGGGG AGTATATTTC CTTGAGGCCT CCCATAGCCG CTCTAGGAAG GCAGAAGGAT 54 0 

TTTCTTCCTT TCCCTGAGTT ATAAAAGACA TCATTGAACA ACTCATGGAC TTTTTCCCAA 600 

TTCTCCGTAG TCCTTCTAGA ACACAGGTCA GCAGATGTTT ACGACTCCAG TCCCCATGAT 660 

CTGAGTCTAG ACACCAGTGG GGATCCATAC TGGGGATGGC CTGCTGACTG GTAGGGAATT 72 0 

TGTCCCTTTC TTTGGCTGTC ATTCTATCAT TTACTTGACT AAGATACCAA GTATCTCCAA 7 80 

ATTCTCAGGC TGCAGCTAAA GCTGCATTCT TTTCATTAAA GGCCAGGGTT TGATCTAATA 84 0 

GCATGACATC TCTCCAAGTG AGGTCAAAGG TTTGCCCTAG ATCCATAGGA CAT C AG AGAA 900 

GGAGAAGGGG ACATACACCT GAGTTAGCCA AATTCCCCTC CCTCTACAGC TTGAAGGGGA 960 

CATAAGCAAT AGCCTGGGGA TTTTTGTGGT CCTTTGGAGA TTTCTTTGCT TGTTTCCTTC 1020 

TGGGTGGGGG AGATTAGAGG AGGCTTATCA GTAATAGGAA GGGGAGCTAT AGGGAGGCTA 1080 

GGATATGGGG GTAAGCTGAG AGGTCATCTT GTGGGATGTA AATTGCAAGC TTTGCATAGT 114 0 

TGTGGATTTT CCTTACAATG AAAATAAAGC TTGGACATAA GGTATTTCAC TCCATTTGCC 1200 

TTCCCTCTTA C AG AAAAG G T CAAGCTGCAG GATAGTACTG TAATTTATAC TTCCTTCAGG 12 60 

TGGCCATTTC TTCCCATCAG AGAGAGAATA CTGGGGCTGG GCCATAGT 1308 
(2) INFORMATIONS FOR SEQ ID NO: 20: Rl 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 711 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

ACTGAGAGAC AGGACTAGCT GGATTTCCTA GGCCGACTAA GAATCCCTAA GCCTAGCTGG 60 

GAAGGTGACC ACGTCCACCT TTAAACACGG GGCTTGCAAC TTAGCTCACA CCTGACCAAT 120 

CAGAGAGCTC ACTAAAATGC TAATTAGGCA AAG AC AG GAG GTAAAGAAAT AGCCAATCAT 180 

CTATTGCCTG AG AG C AC AG C AGGAGGGACA ACAATCGGGA TATAAACCCA GGCATTCGAG 24 0 

CTGGCAACAG CAGCCCCCCT TTGGGTCCCT TCCCTTTGTA TGGGAGCTGT TTTCATGCTA 300 

TTTCACTCTA TTAAATCTTG CAACTGCACT CTTCTGGTCC ATGTTTCTTA CGGCTCGAGC 360 

TGAGCTTTTG CTCACCGTCC ACCACTGCTG TTTGCCACCA CCGCAGACCT GCCGCTGACT 4 20 

CCCATCCCTC TGGATCCTGC AGGGTGTCCG CTGTGCTCCT GATCCAGCGA GGCGCCCATT 4 80 

GCCGCTCCCA ATTGGGCTAA AGGCTTGCCA TTGTTCCTGC ACGGCTAAGT GCCTGGGTTT 54 0 

GTTCTAATTG AGCTGAACAC TAGTCACTGG GTTCCATGGT TCTCTTCTGT GACCCACGGC 600 

TTCTAATAGA ACTATAACAC T T AC C AC AT G GCCCAAGATT CCATTCCTTG GAATCCGTGA 660 

GGCCAAGAAC TCCAGGTCAG AG AAT AC GAG GCTTGCCACC ATCTTGGAAG C 711 
(2) INFORMATIONS FOR SEQ ID NO: 21: R1F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 711 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

ACTGAGAGAC AGGACTAGCT GGATTTCCTA GGCTGACTAA GAATCCCTAA GCCTAGCTGG 60 

GAAGGTGACC ACATCCACCT TTAAACACGG GGCTTGCAAC TTAGCTCACA CCTGACCAAT 120 

CAGAGAGCTC ACTAAAATGC TAATTAGGCA AAG AC AG GAG GTAAAGAAAT AGCCAATCAT 18 0 

CTATTGCCTG AG AG C AC AG C AGGAGGGACA ATGATCGGGA TATAAACCCA AGTCTTCGAG 24 0 



CCGGCAACGG CAACCCCCTT TGGGTCCCCT CCCTTTGTAT GGGAGCTCTG TTTTCATGCT 
ATTTCACTCT ATTAAATCTT GCAACTGCAC TCTTCTGGTC CATGTTTCTT ACGGCTTGAG 



300 
360 
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CTGAGCTTTC GCTCGCCATC CACCACTGCT GTTTGCCGCC ACCGCAGACC CGCCGCTGAC 420 

TCCCATCCCT CTGGATCATG CAGGGTGTCC GCTGTGCTCC TGATCCAGCG AGGCACCCAT 4 80 

TGCCGCTCCC AATCGGGCTA AAGGCTTGCC ATTGTTCCTG CATGGCTAAG TGCCTGGGTT 54 0 

CATCCTAATT GAGCTGAACA CTAGTCACTG GGTTCCATGG TTCTCTTCTG TGACCCACAG 600 

CTTCTAATAG AGCTATAACA CTCACCGCAT GGCCCAAGGT TCCATTCCTT GAATCCATAA 660 

GGCCAAGAAC CCCAGGTCAG AGAACACGAG GCTTGCCACC ATCTTGGGAG C 711 

(2) INFORMATIONS FOR SEQ ID NO: 22: HERV-7q (CODING SEQUENCE WITH 3 READING 
FRAMES) 

(i) SEQUENCE CHARACTERISTICS: 
.(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

AAGCTCCTTCAGGAGAACAAAGAACAGGCCATTACCCTGGAGAAGACTGGCAACTGATTTTACCCACAAGCCCAA 
LysLeuLeuGlnGluAsnLysGluGlnAlalleThrLeuGluLysThrGlyAsn. . . PheTyrProGlnAlaGln 
SerSerPheArgArgThrLysAsnArgProLeuProTrpArgArgLeuAlaThrAspPheThrHisLysProLys 
AlaProSerGlyGluGlnArgThrGlyHisTyrProGlyGluAspTrpGlnLeuIleLeuProThrSerProAsn 

ACCTCAGGGATTTCAGTATCTACTAGTCTGGGTAGATACTTTCACGGGTTGGGCAGAGGCCTTCCCCTGTAGGAC 
ThrSerGlylleSerValSerThrSerLeuGlyArgTyrPheHisGlyLeuGlyArgGlyLeuProLeu. . .Asp 
ProGlnGlyPheGlnTyrLeuLeuValTrpValAspThrPheThrGlyTrpAlaGluAlaPheProCysArgThr 
LeuArgAspPheSerlleTyr . . .SerGly. . . IleLeuSerArgValGlyGlnArgProSerProValGlyGln 

AGAAAAGGCCCAAGAGGTAATAAAGGCACTAGTTCATGAAATAATTCCCAGATTCGGACTTCCCCGAGGCTTACA 
ArgLysGlyProArgGlyAsnLysGlyThrSerSer . . . AsnAsnSerGlnlleArgThrSerProArgLeuThr 
GluLysAlaGlnGluVallleLysAlaLeuValHisGluIlelleProArgPheGlyLeuProArgGlyLeuGln 
LysArgProLysArg ArgHis . . . PheMETLys . . . PheProAspSerAspPheProGluAlaTyrArg 

GAGTGACAATAGCCCTGCTTTCCAGGCCACAGTAACCCAGGGAGTATCCCAGGCGTTAGGTATACGATATCACTT 
Glu. . .Gin. . . ProCysPheProGlyHisSerAsnProGlySerlleProGlyValArgTyrThrlleSerLeu 
SerAspAsnSerProAlaPheGlnAlaThrValThrGlnGlyValSerGlnAlaLeuGlylleArgTyrHisLeu 
ValThrlleAlaLeuLeuSerArgProGln. . . ProArgGluTyrProArgArg . . . ValTyrAspIleThrTyr 

AC AC T G C GC C T G AAG G C C AC AG T C C T C AG G GAAG G T C G AGAAAAT G AAT GAAAC AC T C AAAGGAC AT C T AAAAAA 
ThrLeuArgLeuLysAlaThrValLeuArgGluGlyArgGluAsnGlu. . . AsnThrGlnArgThrSerLysLys 
HisCysAla . . . ArgProGlnSerSerGlyLysValGluLysMETAsnGluThrLeuLysGlyHisLeuLysLys 
ThrAlaProGluGlyHisSerProGlnGlyArgSerArgLys . . . METLysHisSerLysAspIle . . .LysSer 

GCAAACCCAGGAAACCCACCTCACATGGCCTGCTCTGTTGCCTATAGCCTTAAAAAGAATCTGCAACTTTCCCCA 

385 395 405 415 425 435 445 

AlaAsnProGlyAsnProProHisMETAlaCysSerValAlaTyrSerLeuLysLysAsnLeuGlnLeuSerPro 
GlnThrGlnGluThrHisLeuThrTrpProAlaLeuLeuProIleAlaLeuLysArglleCysAsnPheProGln 
LysProArgLysProThrSerHisGlyLeuLeuCysCysLeu. . .Pro. . . LysGluSerAlaThrPheProLys 

AAAAGCAGGACTTAGCCCATACGAAATGCTGTATGGAAGGCCCTTCATAACCAATGACCTTGTGCTTGACCCAAG 
LysSerArgThr . . . ProIleArgAsnAlaValTrpLysAlaLeuHisAsnGln . . .ProCysAla. . . ProLys 
LysAlaGlyLeuSerProTyrGluMETLeuTyrGlyArgProPhelleThrAsnAspLeuValLeuAspProArg 
LysGlnAspLeuAlaHisThrLysCysCysMETGluGlyProSer . . . ProMETThrLeuCysLeuThrGlnAsp 
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ACAGCCAACTTAGTTGCAGACATCACCTCCTTAGCCAAATATCAACAAGTTCTTAAAACATTACAAGGAACCTAT 
ThrAlaAsnLeuValAlaAspIleThrSerLeuAlaLysTyrGlnGlnValLeuLysThrLeuGlnGlyThrTyr 
GlnProThr. . . LeuGlnThrSerProPro . . . ProAsnlleAsnLysPheLeuLysHisTyrLysGluProI le 
SerGlnLeuSerCysArgHisHisLeuLeuSerGlnlleSerThrSerSer . . . AsnlleThrArgAsnLeuSer 

CCCTGAGAAGAGGGAAAAGAACTATTCCACCCTTGTGACATGGTATTAGTCAAGTCCCTTCCCTCTAATTCCCCA 
Pro. . . GluGluGlyLysGluLeuPheHisProCysAspMETValLeuValLysSerLeuProSerAsnSerPro 
ProGluLysArgGluLysAsnTyrSerThrLeuValThrTrpTyr . . . SerSerProPheProLeuIleProHis 
LeuArgArgGlyLysArgThrlleProProLeu. . . HisGlylleSerGlnValProSerLeu . . . PheProIle 

TCCCTAGATACATCCTGGGAAGGACCCTACCCAGTCATTTTATCTACCCCAACTGCGGTTAAAGTGGCTGGAGTG 
SerLeuAspThrSerTrpGluGlyProTyrProVallleLeuSerThrProThrAlaValLysValAlaGlyVal 
Pro. . . IleHisProGlyLysAspProThrGlnSerPheTyrLeuProGlnLeuArgLeuLysTrpLeuGluTrp 
ProArgTyrlleLeuGlyArgThrLeuProSerHisPhelleTyrProAsnCysGly . . . SerGlyTrpSerGly 

GAGTCTTGGATACATCACACTTGAGTCAAATCCTGGATACTGCCAAAGGAACCTGAAAATCCAGGAGACAACGCT 
GluSerTrpIleHisHisThr . . . ValLysSerTrpIleLeuProLysGluProGluAsnProGlyAspAsnAla 
SerLeuGlyTyrlleThrLeuGluSerAsnProGlyTyrCysGlnArgAsnLeuLysIleGlnGluThrThrLeu 
ValLeuAspThrSerHisLeuSerGlnlleLeuAspThrAlaLysGlyThr . . . LysSerArgArgGlnArg . . . 

AGCTATTCCTGTGAACCTCTAGAGGATTTGCGCCTGCTCTTCAAACAACAACCAGGAGGAAAGTAACTAAAATCA 
SerTyrSerCysGluProLeuGluAspLeuArgLeuLeuPheLysGlnGlnProGlyGlyLys . . .LeuLysSer . 
AlalleProValAsnLeu. . . ArglleCysAlaCysSerSerAsnAsnAsnGlnGluGluSerAsn . . .AsnHis 
LeuPheLeu . . . ThrSerArgGlyPheAlaProAlaLeuGlnThrThrThrArgArgLysValThrLysIlelle 

TAAATCCCCATGGCCCTCCCTTATCATATTTTTCTCTTTACTGTTCTTTTACCCTCTTTCACTCTCACTGCACCC 
. . . IleProMETAlaLeuProTyrHisIlePheLeuPheThrValLeuLeuProSerPheThrLeuThrAlaPro 
LysSerProTrpProSerLeuIlellePhePheSerLeuLeuPhePheTyrProLeuSerLeuSerLeuHisPro 
AsnProHisGlyProProLeuSerTyrPheSerLeuTyrCysSerPheThrLeuPheHisSerHisCysThrPro 

CCTCCATGCCGCTGTATGACCAGTAGCTCCCCTTACCAAGAGTTTCTATGGAGAATGCAGCGTCCCGGAAATATT 
ProProCysArgCysMETThrSerSerSerProTyrGlnGluPheLeuTrpArgMETGlnArgProGlyAsnlle 
LeuHisAlaAlaVal . . . ProValAlaProLeuThrLysSerPheTyrGlyGluCysSerValProGluIleLeu 
SerMETProLeuTyrAspGln. . . LeuProLeuProArgValSerMETGluAsnAlaAlaSerArgLysTyr . . . 

GATGCCCCATCGTATAGGAGTCTTTCTAAGGGAACCCCCACCTTCACTGCCCACACCCATATGCCCCGCAACTGC 
AspAlaProSerTyrArgSerLeuSerLysGlyThrProThrPheThrAlaHisThrHisMETProArgAsnCys 
METProHisArglleGlyValPheLeuArgGluProProProSerLeuProThrProIleCysProAlaThrAla 
CysProIleVal . . .GluSerPhe. . . GlyAsnProHisLeuHisCysProHisProTyrAlaProGlnLeuLeu 

TATCACTCTGCCACTCTTTGCATGCATGCAAATACTCATTATTGGACAGGAAAAATGATTAATCCTAGTTGTCCT 
TyrHisSerAlaThrLeuCysMETHisAlaAsnThrHisTyrTrpThrGlyLysMETIleAsnProSerCysPro 
IleThrLeuProLeuPheAlaCysMETGlnlleLeuIlelleGlyGlnGluLys . . . LeuIleLeuValValLeu 
SerLeuCysHisSerLeuHisAlaCysLysTyrSerLeuLeuAspArgLysAsnAsp . . .Ser. . . LeuSerTrp 

GGAGGACTTGGAGTCACTGTCTGTTGGACTTACTTCACCCAAACTGGTATGTCTGATGGGGGTGGAGTTCAAGAT 
GlyGlyLeuGlyValThrValCysTrpThrTyrPheThrGlnThrGlyMETSerAspGlyGlyGlyValGlnAsp 
GluAspLeuGluSerLeuSerValGlyLeuThrSerProLysLeuValCysLeuMETGlyValGluPheLysIle 
ArgThrTrpSerHisCysLeuLeuAspLeuLeuHisProAsnTrpTyrVal . . . TrpGlyTrpSerSerArgSer 

CAGGCAAGAGAAAAACATGTAAAAGi\AGTAATCTCCCAACTCACCCGGGTACATGGCACCTCTAGCCCCTACAAA 
GlnAlaArgGluLysHisValLysGluVallleSerGlnLeuThrArgValHisGlyThrSerSerProTyrLys 
ArgGlnGluLysAsnMET . . . LysLys . . . Ser ProAsnSerProGlyTyrMETAlaProLeuAlaProThrLys 
GlyLysArgLysThrCysLysArgSerAsnLeuProThrHisProGlyThrTrpHisLeu. . . ProLeuGlnArg 

GGACTAGATCTCTCAAAACTACATGAAACCCTCCGTACCCATACTCGCCTGGTAAGCCTATTTAATACCACCCTC 
GlyLeuAspLeuSerLysLeuHisGluThrLeuArgThrHisThrArgLeuValSerLeuPheAsnThrThrLeu 
Asp. . . IleSerGlnAsnTyrMETLysProSerValProIleLeuAlaTrp . . . AlaTyrLeuIleProProSer 
ThrArgSerLeuLysThrThr . . . AsnProProTyr ProTyrSerProGlyLysProIle . . . TyrHisProHis 
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ACTGGGCTCCATGAGGTCTCGGCCCAAAACCCTACTAACTGTTGGATATGCCTCCCCCTGAACTTCAGGCCATAT 
ThrGlyLeuHisGluValSerAlaGlnAsnProThrAsnCysTrpIleCysLeuProLeuAsnPheArgProTyr 
LeuGlySerMETArgSerArgProLysThrLeuLeuThrValGlyTyrAlaSerPro . . . ThrSerGlyHisMET 
TrpAlaPro. . . GlyLeuGlyProLysProTyr . . . LeuLeuAspMETProProProGluLeuGlnAlalleCys 

GTTTCAATCCCTGTACCTGAACAATGGAACAACTTCAGCACAGAAATAAACACCACTTCCGTTTTAGTAGGACCT 
ValSerlleProValProGluGlnTrpAsnAsnPheSerThrGluIleAsnThrThrSerValLeuValGlyPro 

PheGlnSerLeuTyrLeuAsnAsnGlyThrThrSerAlaGlnLys . . . ThrProLeuProPhe AspLeu 

PheAsnProCysThr . . . ThrMETGluGlnLeuGlnHisArgAsnLysHisHisPheArgPheSerArgThrSer 

CTTGTTTCCAATCTGGAAATAACCCATACCTCAAACCTCACCTGTGTT^AAATTTAGCAATACTACATACACAACC 
LeuValSerAsnLeuGluIleThrHisThrSerAsnLeuThrCysValLysPheSerAsnThrThrTyrThrThr 
LeuPheProIleTrpLys . . . ProIleProGlnThrSerProVal . . . AsnLeuAlalleLeuHisThrGlnPro 
CysPheGlnSerGlyAsnAsnProTyrLeuLysProHisLeuCysLysIle . . . GlnTyrTyrlleHisAsnGln 

AACTCCCAATGCATCAGGTGGGTAACTCCTCCCACACTy^ATAGTCTGCCTACCCTCAGGAATATTTTTTGTCTGT 
AsnSerGlnCysIleArgTrpValThrProProThrGlnlleValCysLeuProSerGlyllePhePheValCys 
ThrProAsnAlaSerGlyGly . . . LeuLeuProHisLys . . . SerAlaTyrProGlnGluTyrPheLeuSerVal 
LeuProMETHisGlnValGlyAsnSerSerHisThrAsnSerLeuProThrLeuArgAsnllePheCysLeuTrp 

GGTACCTCAGCCTATCGTTGTTTGAATGGCTCTTCAGAATCTATGTGCTTCCTCTCATTCTTAGTGCCCCCTATG 
GlyThrSerAlaTyrArgCysLeuAsnGlySerSerGluSerMETCysPheLeuSerPheLeuValProProMET 
ValProGlnProIleValVal . . . METAlaLeuGlnAsnLeuCysAlaSerSerHisSer . . .CysProLeu. . . 
TyrLeuSerLeuSerLeuPheGluTrpLeuPheArglleTyrValLeuProLeuIleLeuSerAlaProTyrAsp 

ACCATCTACACTGAACAAGATTTATACAGTTATGTCATATCTAAGCCCCGCAACAAAAGAGTACCCATTCTTCCT 
ThrlleTyrThrGluGlnAspLeuTyrSerTyrVallleSerLysProArgAsnLysArgValProIleLeuPro 
ProSerThrLeuAsnLysIleTyrThrValMETSerTyrLeuSerProAlaThrLysGluTyrProPhePheLeu 
HisLeuHis. . . ThrArgPhelleGlnLeuCysHisIle . . . AlaProGlnGlnLysSerThrHisSerSerPhe 

TTTGTTATAGGAGCAGGAGTGCTAGGTGCACTAGGTACTGGCATTGGCGGTATCACAACCTCTACTCAGTTCTAC 
PheVallleGlyAlaGlyValLeuGlyAlaLeuGlyThrGlylleGlyGlylleThrThrSerThrGlnPheTyr 
LeuLeu. . . GluGlnGluCys . . .ValHis. . . ValLeuAlaLeuAlaValSerGlnProLeuLeuSerSerThr 
CysTyrArgSerArgSerAlaArgCysThrArgTyrTrpHisTrpArgTyrHisAsnLeuTyrSerValLeuLeu 

TACAAACTATCTCAAGAACTAAATGGGGACATGGAACGGGTCGCCGACTCCCTGGTCACCTTGCAAGATCAACTT 
TyrLysLeuSerGlnGluLeuAsnGlyAspMETGluArgValAlaAspSerLeuValThrLeuGlnAspGlnLeu 
ThrAsnTyrLeuLysAsn. . . METGlyThrTrpAsnGlySerProThrProTrpSerProCysLys IleAsnLeu 
GlnThrlleSerArgThrLysTrpGlyHisGlyThrGlyArgArgLeuProGlyHisLeuAlaArgSerThr . . . 

AACTCCCTAGCAGCAGTAGTCCTTCAAAATCGAAGAGCTTTAGACTTGCTAACCGCTGAAAGAGGGGGAACCTGT 
AsnSerLeuAlaAlaValValLeuGlnAsnArgArgAlaLeuAspLeuLeuThrAlaGluArgGlyGlyThrCys 
ThrPro. . .GlnGln. . . SerPheLysIleGluGluLeu . . . ThrCys . . . ProLeuLysGluGlyGluProVal 
LeuProSerSerSerSerProSerLysSerLysSerPheArgLeuAlaAsnArg . . . LysArgGlyAsnLeuPhe 

TTATTTTTAGGGGAAGAATGCTGTTATTATGTTAATCAATCCGGAATCGTCACTGAGAAAGTTAAAGAAATTCGA 
LeuPheLeuGlyGluGluCysCysTyrTyrValAsnGlnSerGlylleValThrGluLysValLysGluIleArg 
TyrPhe . . . GlyLysAsnAlaVallleMETLeuIleAsnProGluSerSerLeuArgLysLeuLysLysPheGlu 
IlePheArgGlyArgMETLeuLeuLeuCys . . . Serl leArgAsnArgHis . . .GluSer. . . ArgAsnSerArg 

GATCGAATACAACGTAGAGCAGAGGAGCTTCGAAACACTGGACCCTGGGGCCTCCTCAGCCAATGGATGCCCTGG 
AspArglleGlnArgArgAlaGluGluLeuArgAsnThrGlyProTrpGlyLeuLeuSerGlnTrpMETProTrp 
IleGluTyrAsnValGluGlnArgSerPheGluThrLeuAspProGlyAlaSerSerAlaAsnGlyCysProGly 
SerAsnThrThr . . . SerArgGlyAlaSerLysHisTrpThrLeuGlyProProGlnProMETAspAlaLeuAsp 



ATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGCTACTCCTCTTTGGACCCTGTATCTTTAACCTCCTT 
IleLeuProPheLeuGlyProLeuAlaAlallelleLeuLeuLeuLeuPheGlyProCysIlePheAsnLeuLeu 



- 66 - 



PheSerProSer . . .AspLeu. . .GlnLeu. . . TyrCysTyrSerSerLeuAspProValSerLeuThrSerLeu 
SerProLeuLeuArgThrSerSerSerTyrAsnlleAlaThrProLeuTrpThrLeuTyrLeu. . . ProProCys 

GTTAACTTTGTCTCTTCCAGAATCGAAGCTGTAAAACTACAAATGGAGCCCAAGATGCAGTCCAAGACTAAGATC 
ValAsnPheValSerSerArglleGluAlaValLysLeuGlnMETGluProLysMETGlnSerLysThrLysIle 
LeuThrLeuSerLeuProGluSerLysLeu . . . AsnTyrLysTrpSerProArgCysSerProArgLeuArgSer 
. . . LeuCysLeuPheGlnAsnArgSerCysLysThrThrAsnGlyAlaGlnAspAlaValGlnAsp. . .AspLeu 

TACCGCAGACCCCTGGACCGGCCTGCTAGCCCACGATCTGATGTTAATGACATCAAAGGCACCCCTCCTGAGGAA 
TyrArgArgProLeuAspArgProAlaSerProArgSerAspValAsnAspIleLysGlyThrProProGluGlu 
ThrAlaAspProTrpThrGlyLeuLeuAlaHisAspLeuMETLeuMETThrSerLysAlaProLeuLeuArgLys 
ProGlnThrProGlyProAlaCys . . .ProThrlle. . . Cys HisGlnArgHisProSer . . .GlyAsn 



ATCTCAGCTGCACAACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGCGGTCGTCGGCCAACCTCCCCA 
IleSerAlaAlaGlnProLeuLeuArgProAsnSerAlaGlySerSer . . . SerGlyArgArgProThrSerPro 
SerGlnLeuHisAsnLeuTyrTyrAlaProIleGlnGlnGluAlaValArgAlaValValGlyGlnProProGln 
LeuSerCysThrThrSerThrThrProGlnPheSerArgLysGlnLeuGluArgSerSerAlaAsnLeuProAsn 

ACAGCACTTAGGTTTTCCTGTTGAGATGGGGG 
ThrAlaLeuArgPheSerCys . . . AspGlyGly 
GlnHisLeuGlyPheProValGluMETGly 
SerThr. . . ValPheLeuLeuArgTrpGly 

(2) INFORMATIONS FOR SEQ ID NO: 23: HERV-7q (DEDUCED ENV PROTEINS) 

(i) SEQUENCE CHARACTERISTICS: 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



PKTANLVADI TSLAKYQQVLKTLQG 
CCCAAGACAGCCAACTTAGTTGCAGACATCACCTCCTTAGCCAAATATCAAC7\AGTTCTTAAAACATTAC7\AGGA 

TYPXEEGKELFHPCDMVLVKSLPSN 
ACCTATCCCTGAGAAGAGGGAAAAGAACTATTCCACCCTTGTGACATGGTATTAGTCAAGTCCCTTCCCTCTAAT 

SPSLDTSWEGPYPVILSTPTAVKVA 
TCCCCATCCCTAGATACATCCTGGGAAGGACCCTACCCAGTCATTTTATCTACCCCAACTGCGGTTAAAGTGGCT 

GVESWIHHTXVKSWILPKEPENPGD 
GGAGTGGAGTCTTGGATACATCACACTTGAGTCAAATCCTGGATACTGCCAAAGGAACCTGAAAATCCAGGAGAC 

NAS Y S CE PLE DLRLL FKQQ P G G K * ' L 
AACGCTAGCTATTCCTGTGAACCTCTAGAGGATTTGCGCCTGCTCTTCAAACAACAACCAGGAGGAAAGTAACTA 

KSXI PMALPYHI FLFTVLLPS FTLT 
AAATCATT^AATCCCCATGGCCCTCCCTTATCATATTTTTCTCTTTACTGTTCTTTTACCCTCTTTCACTCTCACT 

APPPCRCMTSSS PYQEFLWRMQRPG 
GCACCCCCTCCATGCCGCTGTATGACCAGTAGCTCCCCTTACCAAGAGTTTCTATGGAGAATGCAGCGTCCCGGA 

NI DAPSYRSLSKGTPTFTAHTHMPR 
AATATTGATGCCCCATCGTATAGGAGTCTTTCTAAGGGAACCCCCACCTTCACTGCCCACACCCATATGCCCCGC 

NCYHSATLCMHANTHYWTGKMINPS 
AACTGCTATCACTCTGCCACTCTTTGCATGCATGCAAATACTCATTATTGGACAGGAAAAATGATTAATCCTAGT 

CPGGLGVTVCWTYFTQTGMSDGGGV 
TGTCCTGGAGGACTTGGAGTCACTGTCTGTTGGACTTACTTCACCCAAACTGGTATGTCTGATGGGGGTGGAGTT 

QDQAREKHVKEVISQLTRVHGTSSP 
CAAGATCAGGCAAGAGAAAAACATGTAAAAGAAGTAATCTCCCAACTCACCCGGGTACATGGCACCTCTAGCCCC 

YKGLDLSKLHETLRTHTRLVSLFNT 
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TACAAAGGACTAGATCTCTC7\AAACTACATGAAACCCTCCGTACCCATACTCGCCTGGTAAGCCTATTTAATACC 

TLTGLHEVSAQNPTNCWI CLPLNFR 
ACCCTCACTGGGCTCCATGAGGTCTCGGCCCAAAACCCTACTAACTGTTGGATATGCCTCCCCCTGAACTTCAGG 

PYVSI PVPEQWNNFSTEINTTSVLV 
CCATATGTTTCAATCCCTGTACCTGAACAATGGAACAACTTCAGCACAGAAATAAACACCACTTCCGTTTTAGTA 

GPLVSNLEI THTSNLTCVKFSNTTY 
GGACCTCTTGTTTCCAATCTGGTWVTAACCCATACCTCAAACCTCACCTGTGTAAAATTTAGCAATACTACATAC 

TTNSQCIRWVTPPTQIVCLPSGIFF 
ACAACCAACTCCCAATGCATCAGGTGGGT/^ACTCCTCCCACACAAATAGTCTGCCTACCCTCAGGAATATTTTTT 

VCGTSAYRCLNGSSESMCFLSFLVP 
GTCTGTGGTACCTCAGCCTATCGTTGTTTGAATGGCTCTTCAGAATCTATGTGCTTCCTCTCATTCTTAGTGCCC 

PMTIYTEQDLYSYVISKPRNKRVPI 
CCTATGACCATCTACACTGAACAAGATTTATACAGTTATGTCATATCTAAGCCCCGCAACT^AAAGAGTACCCATT 

LPFVIGAGVLGALGTGIGGI TTSTQ 
CTTCCTTTTGTTATAGGAGCAGGAGTGCTAGGTGCACTAGGTACTGGCATTGGCGGTATCACAACCTCTACTCAG 

FYYKLSQELNGDMERVADSLVTLQD 
TTCTACTACAAACTATCTCAAGAACTAAATGGGGACATGGAACGGGTCGCCGACTCCCTGGTCACCTTGC/^AGAT 

QLNSLAAVVLQNRRALDLLTAERGG 
CAACTTAACTCCCTAGCAGCAGTAGTCCTTC/^AAATCGAAGAGCTTTAGACTTGCTAACCGCTGAAAGAGGGGGA 

TCLFLGEECCYYVNQSGIVTEKVKE 
ACCTGTTTATTTTTAGGGG7VAGAATGCTGTTATTATGTTAATCAATCCGGAATCGTCACTGAGAAAGTTAAAGAA 

IRDRIQRRAEELRNTGPWGLLSQWM 
ATTCGAGATCGAATACAACGTAGAGCAGAGGAGCTTCGAAACACTGGACCCTGGGGCCTCCTCAGCCAATGGATG 

PWILPFLGPLAAI ILLLLFGPCIFN 
CCCTGGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGCTACTCCTCTTTGGACCCTGTATCTTTAAC 

LLVNFVS S RI EAVKLQME PKMQSKT 
CTCCTTGTTAACTTTGTCTCTTCCAGAATCGAAGCTGTAAAACTACAAATGGAGCCCAAGATGCAGTCCAAGACT 

KIYRRPLDRPASPRSDVNDI KGTPP 
AAGATCTACCGCAGACCCCTGGACCGGCCTGCTAGCCCACGATCTGATGTTAATGACATCAAAGGCACCCCTCCT 

EEISAAQPLLRPNSAGS SXSGRRPT 
GAGGAAATCTCAGCTGCACAACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGCGGTCGTCGGCCAACC 

SPTALRFSCX 
TCCCCAACAGCACTTAGGTTTTCCTGTTGA 



(2) INFORMATIONS FOR SEQ ID NO: 24: HERV-7q (GAG CODING SEQUENCE) 

(i) SEQUENCE CHARACTERISTICS: 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

TSFVEKANGVKCHKY 
ACC TCT TTT GTA GAA AAG GCA AAT GGA GTG AAG TGC CAT AAG TAC 

KLS FHXETTHNYVKS 
AAA CTT TCT TTT CAT TAA GAG ACA ACT CAC AAT TAT GTA AAA AGT 

VIYALQEAFRVYL P I 
GTG ATT TAT GCC CTA CAG GAA 'GCC TTC AGA GTC TAC CTC CCT ATC 

PASPTPSPTNKDPPS 
CCA GCA TCC CCG ACT CCT TCC CCA ACT AAT AAG GAC CCC CCT TCA 

TQMVQKEI DKRVNSE 
ACC CAA ATG GTC CAA AAG GAG ATA GAC AAA AGG GTA AAC AGT GAA 

PKSANI PQLXPLQAV 
CCA AAG AGT GCC AAT ATT CCC CAA TTA TGA CCC CTC CAA GCA GTG 

GGREFGPARVHVPFS 
GGA GGA AGA GAA TTC GGC CCA GCC AGA GTG CAT GTG CCT TTT TCT 



*l .1 
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(2) INFORMATIONS FOR SEQ ID NO: 25: ENV PROTEIN (READING FRAME 1) 

(i) SEQUENCE CHARACTERISTICS: 
(B) TYPE: AMINO ACID, 
(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: proteine 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



PKTANLVADITSLAKYQQVLKTLQGTYPXEEGKELFHPCDMVLVKSLPSNSPSLDTSWEG 
PYPVILSTPTAVKVAGVESWIHHTXVKSWILPKEPENPGDNASYSCEPLEDLRLLFKQQP 
GGKXLKSXIPMALPYHIFLFTVLLPSFTLTAPPPCRCMTSSSPYQEFLWRMQRPGNIDAP 
SYRSLSKGTPTFTAHTHMPRNCYHSATLCMHANTHYWTGKMINPSCPGGLGVTVCWTYFT 
QTGMS DGGGVQ DQAREKHVKEVISQLTRVHGTSSPYKGLDLSKLHETLRTHTRLVSLFNT 
TLTGLHEVSAQNPTNCWICLPLNFRPYVSIPVPEQWNNFSTEINTTSVLVGPLVSNLEIT 
HTSNLTCVKFSNTTYTTNSQCIRWVTPPTQIVCLPSGI FFVCGTSAYRCLNGSSESMCFL 
SFLVPPMTI YTEQDLYSYVISKPRNKRVPILPFVIGAGVLGALGTGIGGITTSTQFYYKL 
SQELNGDMERVADSLVTLQDQLNSLAAVVLQNRRALDLLTAERGGTCLFLGEECCYYVNQ 
SGIVTEKVKEIRDRIQRRAEELRNTGPWGLLSQWMPWILPFLGPLAAI ILLLLFGPCIFN 
LLVNFVSSRIEAVKLQMEPKMQSKTKI YRRPLDRPASPRSDVNDIKGTPPEEISAAQPLL 
RPNSAGSSXSGRRPTSPTALRFSCX 

(2) INFORMATIONS FOR SEQ ID NO: 26: gag PROTEIN 
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(i) SEQUENCE CHARACTERISTICS: 
(B) TYPE: AMINO ACID 
(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

TS FVEKANGVKCHKYKLS FHXETTHNYVKS VI YALQEAFRVYLPI PAS PT PS PTNKDPPS 
TQMVQKEIDKRVNSEPKSANIPQLXPLQAVGGREFGPARVHVPFSLPDLKQIKTDLGKFS 
DNPDGYIDVLQGLGQFFDLTWRDIMSLLNQTLTPNERSATITAAXEFGDLWYLSQVNDRM 
TTEEREXFPTGQQAVPSLDPHWDTESEHGDWCCRHLLTCVLEGLRKTRKKSMNYSMMSTI 
TQGREENPTAFLERLREALRKRASLSPDSSEGQLILKRKFITQSAADIRKKLQKSAVGPE 
QNLETLLNLATSVFYNRDQEEQAEQDKRDXKKGHRFSHDPQASGLWRLWKREKLGKLNAX 

(2) INFORMATIONS FOR SEQ ID NO: 27: env PROTEIN (READING FRAME 1) 

(i) SEQUENCE CHARACTERISTICS: 
(B) TYPE: AMINO ACID, 
(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

LysLeuLeuGlnGluAsnLysGluGlnAlalleThrLeuGluLysThrGlyAsn. . . PheTyr ProGlnAlaGln 

ThrSerGlylleSerValSerThrSerLeuGlyArgTyrPheHisGlyLeuGlyArgGlyLeuProLeu. . .Asp 

ArgLysGlyProArgGlyAsnLysGlyThrSerSer . . . AsnAsnSerGlnlleArgThrSer ProArgLeuThr 

Glu. . .Gin. . . ProCysPheProGlyHisSerAsnProGlySerlleProGlyValArgTyrThrlleSerLeu 

ThrLeuArgLeuLysAlaThrValLeuArgGluGlyArgGluAsnGlu . . . AsnThrGlnArgThrSerLysLys 

AlaAsnProGlyAsnProProHisMETAlaCysSerValAlaTyrSerLeuLysLysAsnLeuGlnLeuSerPro 
LysSerArgThr . . . ProIleArgAsnAlaValTrpLysAlaLeuHisAsnGln . . .ProCysAla. . . ProLys 
ThrAlaAsnLeuValAlaAspIleThrSerLeuAlaLysTyrGlnGlnValLeuLysThrLeuGlnGlyThrTyr 
Pro . . . GluGluGlyLysGluLeuPheHisProCysAspMETValLeuValLysSerLeuProSerAsnSerPro 
SerLeuAspThrSerTrpGluGlyProTyrProVallleLeuSerThrProThrAlaValLysValAlaGlyVal 
GluSerTrpIleHisHisThr . . . ValLysSerTrpIleLeuProLysGluProGluAsnProGlyAspAsnAla 
SerTyrSerCysGluProLeuGluAspLeuArgLeuLeuPheLysGlnGlnProGlyGlyLys . . . LeuLysSer 
. . . IleProMETAlaLeuProTyrHisIlePheLeuPheThrValLeuLeuProSerPheThrLeuThrAlaPro 
ProProCysArgCysMETThrSerSerSerProTyrGlnGluPheLeuTrpArgMETGlnArgProGlyAsnlle 
AspAlaProSerTyrArgSerLeuSerLysGlyThrProThrPheThrAlaHisThrHisMETProArgAsnCys 
TyrHisSerAlaThrLeuCysMETHisAlaAsnThrHisTyrTrpThrGlyLysMETIleAsnProSerCysPro 
GlyGlyLeuGlyValThrValCysTrpThrTyrPheThrGlnThrGlyMETSerAspGlyGlyGlyValGlnAsp 
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GlnAlaArgGluLysHisValLysGluVallleSerGlnLeuThrArgValHisGlyThrSerSerProTyrLys 

GlyLeuAspLeuSerLysLeuHisGluThrLeuArgThrHisThrArgLeuValSerLeuPheAsnThrThrLeu 
ThrGlyLeuHisGluValSerAlaGlnAsnProThrAsnCysTrpIleCysLeuProLeuAsnPheArgProTyr 
ValSerlleProValProGluGlnTrpAsnAsnPheSerThrGluIleAsnThrThrSerValLeuValGlyPro 
LeuValSerAsnLeuGluIleThrHisThrSerAsnLeuThrCysValLysPheSerAsnThrThrTyrThrThr 
AsnSerGlnCysIleArgTrpValThrProProThrGlnlleValCysLeuProSerGlyllePhePheValCys 
GlyThrSerAlaTyrArgCysLeuAsnGlySerSerGluSerMETCysPheLeuSerPheLeuValProProMET 
ThrlleTyrThrGluGlnAspLeuTyrSerTyrVallleSerLysProArgAsnLysArgValProIleLeuPro 
PheVallleGlyAlaGlyValLeuGlyAlaLeuGlyThrGlylleGlyGlylleThrThrSerThrGlnPheTyr 
TyrLysLeuSerGlnGluLeuAsnGlyAspMETGluArgValAlaAspSerLeuValThrLeuGlnAspGlnLeu 
AsnSerLeuAlaAlaValValLeuGlnAsnArgArgAlaLeuAspLeuLeuThrAlaGluArgGlyGlyThrCys 
LeuPheLeuGlyGluGluCysCysTyrTyrValAsnGlnSerGlylleValThrGluLysValLysGluIleArg 
AspArglleGlnArgArgAlaGluGluLeuArgAsnThrGlyProTrpGlyLeuLeuSerGlnTrpMETProTrp 
IleLeuProPheLeuGlyProLeuAlaAlallelleLeuLeuLeuLeuPheGlyProCysIlePheAsnLeuLeu 
ValAsnPheValSerSerArglleGluAlaValLysLeuGlnMETGluProLysMETGlnSerLysThrLysIle 
TyrArgArgProLeuAspArgProAlaSerProArgSerAspValAsnAspIleLysGlyThrProProGluGlu 
IleSerAlaAlaGlnProLeuLeuArgProAsnSerAlaGlySerSer . . . SerGlyArgArgProThrSerPro 
ThrAlaLeuArgPheSerCys . . . AspGlyGly 

(2) INFORMATIONS FOR SEQ ID NO: 28: env protein (open reading frame 2) 

(i) SEQUENCE CHARACTERISTICS: 
(B) TYPE: amino acid, 
(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

SerSerPheArgArgThrLysAsnArgProLeuProTrpArgArgLeuAlaThrAspPheThrHisLysProLys 
ProGlnGlyPheGlnTyrLeuLeuValTrpValAspThrPheThrGlyTrpAlaGluAlaPheProCysArgThr 
GluLysAlaGlnGluVallleLysAlaLeuValHisGluIlelleProArgPheGlyLeuProArgGlyLeuGln 
SerAspAsnSerProAlaPheGlnAlaThrValThrGlnGlyValSerGlnAlaLeuGlylleArgTyrHisLeu 
HisCysAla . . . ArgProGlnSerSerGlyLysValGluLysMETAsnGluThrLeuLysGlyHisLeuLysLys 
GlnThrGlnGluThrHisLeuThrTrpProAlaLeuLeuProIleAlaLeuLysArglleCysAsnPheProGln 
LysAlaGlyLeuSerProTyrGluMETLeuTyrGlyArgProPhelleThrAsnAspLeuValLeuAspProArg 
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GlnProThr. . . LeuGlnThrSerProPro . . . ProAsnlleAsnLysPheLeuLysHisTyrLysGluProIle 
ProGluLysArgGluLysAsnTyrSerThrLeuValThrTrpTyr . . . SerSerProPheProLeuIleProHis 
Pro. . . IleHisProGlyLysAspProThrGlnSerPheTyrLeuProGlnLeuArgLeuLysTrpLeuGluTrp 
SerLeuGlyTyrlleThrLeuGluSerAsnProGlyTyrCysGlnArgAsnLeuLysIleGlnGluThrThrLeu 
AlalleProValAsnLeu. . . ArglleCysAlaCysSerSerAsnAsnAsnGlnGluGluSerAsn . . .AsnHis 
LysSerProTrpProSerLeuIlellePhePheSerLeuLeuPhePheTyrProLeuSerLeuSerLeuHisPro 
LeuHisAlaAlaVal . . . ProValAlaProLeuThrLysSerPheTyrGlyGluCysSerValProGluIleLeu 
METProHisArglleGlyValPheLeuArgGluProProProSerLeuProThrProIleCysProAlaThrAla 
IleThrLeuProLeuPheAlaCysMETGlnlleLeuIlelleGlyGlnGluLys . . . LeuIleLeuValValLeu 
GluAspLeuGluSerLeuSerValGlyLeuThrSerProLysLeuValCysLeuMETGlyValGluPheLysIle 
ArgGlnGluLysAsnMET. . . LysLys. . . SerProAsnSerProGlyTyrMETAlaProLeuAlaProThrLys 
Asp. . . IleSerGlnAsnTyrMETLysProSerValProIleLeuAlaTrp. . . AlaTyrLeuIleProProSer 
LeuGlySerMETArgSerArgProLysThrLeuLeuThrValGlyTyrAlaSerPro . . . ThrSerGlyHisMET 

PheGlnSerLeuTyrLeuAsnAsnGlyThrThrSerAlaGlnLys . . . Thr ProLeuProPhe AspLeu 

LeuPheProIleTrpLys . . . ProIleProGlnThrSerProVal . . . AsnLeuAlal leLeuHisThrGlnPro 
ThrProAsnAlaSerGlyGly . . . LeuLeuProHisLys . . . SerAlaTyrProGlnGluTyrPheLeuSerVal 
ValProGlnProIleValVal . . . METAlaLeuGlnAsnLeuCysAlaSerSerHisSer . . .CysProLeu. . . 
ProSerThrLeuAsnLysIleTyrThrValMETSerTyrLeuSerProAlaThrLysGluTyrProPhePheLeu 
LeuLeu. . . GluGlnGluCys . . .ValHis. . . ValLeuAlaLeuAlaValSerGlnProLeuLeuSerSerThr 
ThrAsnTyrLeuLysAsn. . . METGlyThrTrpAsnGlySerProThrProTrpSerProCysLysIleAsnLeu 
ThrPro. . .GlnGln. . . SerPheLysIleGluGluLeu . . . ThrCys . . . ProLeuLysGluGlyGluProVal 
TyrPhe . . . GlyLysAsnAlaVallleMETLeuIleAsnProGluSerSerLeuArgLysLeuLysLysPheGlu 
IleGluTyrAsnValGluGlnArgSerPheGluThrLeuAspProGlyAlaSerSerAlaAsnGlyCysProGly 
PheSerProSer . . .AspLeu. . .GlnLeu. . . TyrCysTyrSerSerLeuAspProValSerLeuThrSerLeu 
LeuThrLeuSerLeuProGluSerLysLeu . . . AsnTyrLysTrpSerProArgCysSerProArgLeuArgSer 
ThrAlaAspProTrpThrGlyLeuLeuAlaHisAspLeuMETLeuMETThrSerLysAlaProLeuLeuArgLys 
SerGlnLeuHisAsnLeuTyrTyrAlaProIleGlnGlnGluAlaValArgAlaValValGlyGlnProProGln 
GlnHisLeuGlyPheProValGluMETGly 

(2) INFORMATIONS FOR SEQ ID NO: 29: env protein (open reading frame 3) 
(i) SEQUENCE CHARACTERISTICS: 



(B) TYPE: amino acid, 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: proteine 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

AlaProSerGlyGluGlnArgThrGlyHisTyrProGlyGluAspTrpGlnLeuIleLeuProThrSerProAsn 
LeuArgAspPheSerlleTyr . . .SerGly. . . IleLeuSerArgValGlyGlnArgProSerProValGlyGln 

LysArgProLysArg ArgHis . . . PheMETLys . . . PheProAspSerAspPheProGluAlaTyrArg 

ValThrlleAlaLeuLeuSerArgProGln. . . ProArgGluTyrProArgArg . . . ValTyrAspIleThrTyr 
ThrAlaProGluGlyHisSerProGlnGlyArgSerArgLys . . . METLysHisSerLysAspIle . . .LysSer 
LysProArgLysProThrSerHisGlyLeuLeuCysCysLeu. . .Pro. . . LysGluSerAlaThrPheProLys 
LysGlnAspLeuAlaHisThrLysCysCysMETGluGlyProSer . . . ProMETThrLeuCysLeuThrGlnAsp 
SerGlnLeuSerCysArgHisHisLeuLeuSerGlnlleSerThrSerSer . . . AsnlleThrArgAsnLeuSer 
LeuArgArgGlyLysArgThrlleProProLeu. . . HisGlylleSerGlnValProSerLeu . . .PheProIle 
ProArgTyrlleLeuGlyArgThrLeuProSerHisPhelleTyrProAsnCysGly . . . SerGlyTrpSerGly 
ValLeuAspThrSerHisLeuSerGlnlleLeuAspThrAlaLysGlyThr . . . LysSerArgArgGlnArg . . . 
LeuPheLeu. . . ThrSerArgGlyPheAlaProAlaLeuGlnThrThrThrArgArgLysValThrLysIlelle 
AsnProHisGlyProProLeuSerTyrPheSerLeuTyrCysSerPheThrLeuPheHisSerHisCysThrPro 
SerMETProLeuTyrAspGln . . . LeuProLeuProArgValSerMETGluAsnAlaAlaSerArgLysTyr . . . 
CysProIleVal . . .GluSerPhe. . . GlyAsnProHisLeuHisCysProHisProTyrAlaProGlnLeuLeu 
SerLeuCysHisSerLeuHisAlaCysLysTyrSerLeuLeuAspArgLysAsnAsp . . .Ser. . . LeuSerTrp 
ArgThrTrpSerHisCysLeuLeuAspLeuLeuHisProAsnTrpTyrVal . . . TrpGlyTrpSerSerArgSer 
GlyLysArgLysThrCysLysArgSerAsnLeuProThrHisProGlyThrTrpHisLeu . . . ProLeuGlnArg 
ThrArgSerLeuLysThrThr . . . AsnProProTyrProTyrSerProGlyLysProIle . . . TyrHisProHis 
TrpAlaPro. . . GlyLeuGlyProLysProTyr . . . LeuLeuAspMETProProProGluLeuGlnAlalleCys 
PheAsnProCysThr . . . ThrMETGluGlnLeuGlnHisArgAsnLysHisHisPheArgPheSerArgThrSer 
CysPheGlnSerGlyAsnAsnProTyrLeuLysProHisLeuCysLysIle . . . GlnTyrTyrlleHisAsnGln 
LeuProMETHisGlnValGlyAsnSerSerHisThrAsnSerLeuProThrLeuArgAsnllePheCysLeuTrp 
TyrLeuSerLeuSerLeuPheGluTrpLeuPheArglleTyrValLeuProLeuIleLeuSerAlaProTyrAsp 
HisLeuHis. . . ThrArgPhelleGlnLeuCysHisIle . . . AlaProGlnGlnLysSerThrHisSerSerPhe 
CysTyrArgSerArgSerAlaArgCysThrArgTyrTrpHisTrpArgTyrHisAsnLeuTyrSerValLeuLeu 
GlnThrlleSerArgThrLysTrpGlyHisGlyThrGlyArgArgLeuProGlyHisLeuAlaArgSerThr . . . 
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LeuProSerSerSerSer ProSerLysSerLysSerPheArgLeuAlaAsnArg . . . LysArgGlyAsnLeuPhe 
IlePheArgGlyArgMETLeuLeuLeuCys . . . SerlleArgAsnArgHis . . .GluSer. . . ArgAsnSerArg 
SerAsnThrThr . . . SerArgGlyAlaSerLysHisTrpThrLeuGlyProProGlnProMETAspAlaLeuAsp 
SerProLeuLeuArgThrSerSerSerTyrAsnlleAlaThrProLeuTrpThrLeuTyrLeu . . . ProProCys 
. . .LeuCysLeuPheGlnAsnArgSerCysLysThrThrAsnGlyAlaGlnAspAlaValGlnAsp. . .AspLeu 

ProGlnThrProGlyProAlaCys . . .ProThrlle. . . Cys HisGlnArgHisProSer . . .GlyAsn 

LeuSerCysThrThrSerThrThrProGlnPheSerArgLysGlnLeuGluArgSerSerAlaAsnLeuProAsn 
SerThr. . . ValPheLeuLeuArgTrpGly 



(2) INFORMATIONS FORc SEQ ID NO: 30 : GIF 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 : 
G G AC CAT AG AGG AC AC T C C AG G AC T A 
(2) INFORMATIONS FOR SEQ ID NO: 31 : G1R 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 : 
CCTCAGTCCTGCTGCTGGATCATCT 

(2) INFORMATIONS FOR SEQ ID NO: 32 : G2F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 : 
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CCTCCAAGCAGTGGGAGGAAGAGAATT 
(2) INFORMATIONS FOR SEQ ID NO: 33 : G2R 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 : 
CCTTCCCTGTGTTATTGTGGACATCATT 



(2) INFORMATIONS FOR SEQ ID NO: 34 : G4F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 : 
G G AAG AAG T C T A T G AAT TAT T C AAT GAT G T 
(2) INFORMATIONS FOR SEQ ID NO: 35 : G3F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 



(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 : 
GGGAC ACAGAATCAGAACAT GGAGAT T 
(2) INFORMATIONS FOR SEQ ID NO: 36 : G4R 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
G C C T T C AG AAG AG T C AG G T G AC AG AG A 



(2) INFORMATIONS FOR SEQ ID NO: 37 : GSR 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
GAGCCTCCAAAGTCCACTTGCCTGA 

(2) INFORMATIONS FOR SEQ ID NO: 38 : E1F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
GATTTCAGTATCTACTAGTCTGGGTAGAT 
(2) INFORMATIONS FOR SEQ ID NO: 39 : E1R 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
CTAGGAAATCCAGCTAGTCCTGTCTCA 
(2) INFORMATIONS FOR SEQ ID NO: 40 : E2F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 : 
CC AAG AC AGCC AAC T T AGT T GC AG AC AT 
(2) INFORMATIONS FOR SEQ ID NO: 41 : E2R 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 : 
GGACGCTGCATTCTCCATAGAAACTCTT 
(2) INFORMATIONS FOR SEQ ID NO: 42 : E3F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 : 
GCAATACTACATACACAACCAACTCCCAA 
(2) INFORMATIONS FOR SEQ ID NO : 4 3 : E3R 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 3 : 
GGGGGAGGCATATCCAACAGTTAGTA 
(2) INFORMATIONS FOR SEQ ID NO: 4 4 : E4F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 



CC AT C T AC AC T G AAC AAG AT T TAT AC AC T T 
(2) INFORMATIONS FOR SEQ ID NO: 45 : E4R 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 
AATGCCAGTACCTAGTGCACCTAGCACT 
(2) INFORMATIONS FOR SEQ ID NO : 4 6 : E5F 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 
C G AAT AC AAC G TAG AG C AG AG GAG C T T C G AA 



(2) INFORMATIONS FOR SEQ ID NO : 4 7 : E6F 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleotide 

(C) strands number: single 

(D) CONFIGURATION: line 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
AG C C C AAG AT GC AG T C C AAG AC T AAG AT 
(2) INFORMATIONS FOR SEQ ID NO: 48 : E5R 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleotide 

(C) NUMBER OF STRANDS: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48 : 
GCGTAGTAGAGGTTGTGCAGCTGAGAT 
(2) INFORMATIONS FOR SEQ ID NO: 49 : ExF 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: -27 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49 : 
CCCTTACCAAGAGTTTCTATGGAGAAT 
(2) INFORMATIONS FOR SEQ ID NO: 50 : ExR 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleotide 

(C) STRANDS NUMBER: single 

(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: cDNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 : 
ACCGCTCTAACTGCTTCCTGCTGAATT 

(2) INFORMATIONS FOR SEQ ID NO: 51: gag protein 

(i) SEQUENCE CHARACTERISTICS: 
(B) TYPE: amino acid, 
(D) CONFIGURATION: linear 

(ii) TYPE OF MOLECULE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

TSFVEKANGVKCHKYKLSFHXETTHNYVKSVIYALQEAFRVYLPILPASPTPSPTNKDPPSTQMVQKEIDKRVNSEPKSA 
NIPQLXPLQAVGGREFGPARVHVPFSLPDLKQIKTDLGKFSDNPDGYIDVLQGLGQFFDLTWRDIMSLLNQTLTPNERSA 
TITT^AXEFGDLWYLSQVNDRMTTEEREXFPTGQQAVPSLDPHWDTESEHGDWCCRHLLTCVLEGLRKTRKKSMNYSMMST 
ITQGREENPTAFLERLREALRKRASLSPDSSEGQLILKRKFITQSAADIRKKLQKSAVGPEQNLETLLNLATSVFYNRDQ 
EEQAEQDKRDXKKGHRFSHDPQASGLWRLWKREKLGKLNAXXGLLPVRSTRTLXKRLSKXKXAAPSSMPLISRESLEGPL 
PQGT KVLX VRS HX P D / S S S RT 
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CLAIMS 

1. A purified nucleic acid fragment, characterized 
5 in that it comprises all or part of a sequence encoding 

a human endogenous retroviral sequence, which has at 
least env-type retroviral motifs, corresponding to the 
sequence SEQ ID NO: 1 or to a sequence exhibiting a 
level of homology with the said sequence SEQ ID NO: 1 
10 greater than or equal to 80% on more than 190 
nucleotides or greater than or " equal to 70% on more 
than 600 nucleotides for the env-type domains. 

2. The nucleic acid fragment as claimed in 
claim 1, characterized in that it has retroviral motifs 

15 corresponding to an env domain and corresponding to the 
sequence SEQ ID NO: 1 and retroviral motifs 

corresponding to a gag domain and corresponding to the 
sequence SEQ ID NO: 2 or to a sequence exhibiting a 
level of homology greater than or equal to 80% on more 

20 than 190 nucleotides or greater than or equal to 70% on 
more than 600 nucleotides for the env-type domains and 
a level of homology greater than or equal to 90% on 
more than 700 nucleotides or greater than or equal to 
70% on more than 1 200 nucleotides for the gag-type 

25 domains, the said motifs having no insertion or 
deletion of more than 200 nucleotides, 

3. A nucleic acid fragment, characterized in that 
it comprises a segment of a sequence as claimed in 
claim 1 or claim 2 and in particular the sequence 

30 SEQ ID NO: 3-24, the complementary nucleic sequences 
and the reverse sequences complementary to the 
preceding sequences as well as fragments derived from 
the coding regions of the preceding sequences 
corresponding to a shifting frame greater than or equal 

35 to 14 nucleotides or their complementary sequences. 
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4. Transcripts, characterized in that they are 
generated from the sequences as claimed in any one of 
claims 1 to 3 . 

5. A diagnostic reagent for the differential 
5 detection of complete or partial human endogenous 

nucleic sequences, having retroviral motifs, selected 
from the sequences SEQ ID NO: 1 and/or SEQ ID NO: 2, 
characterized in that it is selected from the group 
consisting of the sequences SEQ ID NO: 1-50, the 

10 complementary nucleic sequences and the reverse 
sequences complementary to the preceding sequences, of 
nucleotide fragments capable of defining or of 
identifying the sequences SEQ ID NO: 1 and/or 
SEQ ID NO: 2 and any flanking sequence or any sequence 

15 overlapping them as well as of fragments derived from 
the coding regions of the sequences SEQ ID NO: 1-24, 
corresponding to a shifting frame greater than or equal 
to 14 nucleotides or their complementary sequences, 
optionally labeled with an appropriate label. 

20 6. The reagent as claimed in claim 5, 

characterized in that it is chosen from the regions 
situated between nucleotides 3065 and 4390, nucleotides 
6965 and 9550 of SEQ ID NO: 3. 

7. The reagent as claimed in claim 5, 
25 characterized in that it is selected from the sequences 

SEQ ID NO: 30-50, and in that it is capable of being 
used as a primer. 

8. The reagent as claimed in claim 5, 
characterized in that it is selected from the following 

30 sequences: 

- a fragment of 1505 nt amplified by the pair 
of primers SEQ ID NO: 30 and SEQ ID NO: 31 primers GIF 
and G1R) , 

- a fragment of 2529 nt amplified by the pair 
35 of primers SEQ ID NO: 38 and SEQ ID NO: 39 (primers E1F 

and E1R) , 
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and in that it is capable of being used 

as a probe. 

9. A method for the rapid and differential 

detection of the endogenous retroviral nucleic 
5 sequences of the env or env and gag type, their normal 
or pathological variants, by hybridization and/or gene 
amplification, carried out using a biological sample, 
which method is characterized in that it comprises: 

(a) a step in which a biological sample to be 
10 analyzed is brought into contact with at least one 

probe as claimed in claim 5, claim 6 or claim 8, and 

(b) a step in which the product (s) resulting 
from the nucleotide sequence-probe interaction is 
detected by any appropriate means. 

15 10. The method of detection as claimed in claim 9, 

characterized in that it comprises: 
prior to step (a) : 
. a step of preparing the relevant biological 

tissue or fluid, 
20 . a step of extracting the nucleic acid to be 

detected, and 

. at least one gene amplification cycle carried 

out with the aid of at least one reagent as claimed in 

any one of claims 5 to 7, and 
25 * subsequent to step (b) : 

. a step of comparing the nucleic sequences 

obtained in the said biological sample with the human 

endogenous retroviral sequences as claimed in any one 

of claims 1 to 3, by any appropriate means and in 
30 particular by sequencing, Southern blotting, 

restriction cleavage, SSCP or any other method which 

makes it possible to identify an insertion or a 

deletion or a single mutation between the various 

sequences compared . 
35 11. A method of detecting the transcripts as 

claimed in claim 4, characterized in that it comprises: 
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- collecting messenger RNAs obtained from 
control biological samples and from a similar sample 
collected from patients, and 

- the qualitative and/or quantitative analysis 
5 of the said mRNAs by in situ hybridization, by dot- 
blot, Northern blotting, RNAse mapping or RT-PCR, with 
the aid of a diagnostic reagent as claimed in any one 
of claims 5 to 8 . 

12. Translational products, characterized in that they 
10 are encoded by a nucleotide sequence as claimed in any 

one of claims 1 to 3. 

13. A peptide, characterized in that it is capable of 
being expressed with the aid of a nucleotide sequence 
selected from the group consisting of the sequences 

15 SEQ ID NO: 1-24 as claimed in any one of claims 1 to 3, 
according to the combinations offered by usage of the 
different possible open reading frames. 

14. The peptide as claimed in claim 13, characterized 
in that it includes the derived peptides comprising 

20 between 5 and 540 amino acids. 

15. The peptide as claimed in claim 13 or claim 14, 
characterized in that it is selected from: 

. the sequences SEQ ID NO: 25-29 and 
. the sequence SEQ ID NO: 51; 

25 

16. The peptide as claimed in any one of claims 13 to 
15, characterized in that it is obtained from nucleic 
sequences as claimed in any one of claims 1 to 3, in 
which at least one non-sense codon may be replaced with 

30 a codon encoding one of the following amino acids: Phe 
(F), Leu (L) , Ser (S), Tyr (Y) , Cys (C) , Trp (W) , Gin 
(Q), Arg (R) , Lys (K) , Glu (E) or Gly (G) . 

17. An antibody, characterized in that it is directed 
against one or more of the peptides as claimed in any 

35 one of claims 13 to 16. 

18. A method for the differential immunological 
screening of normal or pathological human endogenous 
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retroviral sequences of the HERV-7q family, 
characterized in that it comprises bringing a 
biological sample into contact with an antibody as 
claimed in claim 17, the reading of the result being 
5 visualized by an appropriate means, in particular EIA, 
ELISA, RIA, fluorescence. 

19. A method for the identification and detection of 
endogenous retroviral motifs which are abnormally 
expressed in the context of pathological conditions 

10 associated with cancer, or of neuropathological 
conditions, in particular autoimmune neuropathological 
conditions, at the forefront of which is multiple 
sclerosis, characterized in that it comprises the 
comparative analysis of the sequences extracted from a 

15 biological sample and the sequences as claimed in any 
one of claims 12 to 16. 

18. An application of the sequences as claimed in any 
one of claims 1 to 6 or 12 to 16 to the diagnosis of, 
to the prognosis of, to the evaluation of genetic 

20 susceptibility to, any induced, congenital or acquired 
human diseases, in particular those with cancerous, 
autoimmune and/or neurological components, such as 
multiple sclerosis, the associated syndromes and the 
neurodegenerative diseases in which all or part of the 

25 sequences as claimed in to any one of claims 1 to 5 and 
related endogenous or exogenous forms are involved. 

19. Hybrid nucleic sequences, characterized in that 
they comprise sequences or motifs as claimed in any one 
of claims 1 to 5, combined with sequences or motifs of 

30 endogenous origin or of exogenous origin or induced 
exogenously . 

20. A recombinant cloning or expression vector, 
characterized in that it comprises a nucleic sequence 
as claimed in any one of claims 1 to 4 . 
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CCC7CGCGCGGGC77C C777C7GGGA7 G AGGCC AA AA.CGC C7 GG AC AT AC AGC AA77 A7C 77CC AA 7 TCAG 

AGACAGGACTAGCTGGATTTCCTAGGCCGACTAAGAATCCCTAAGCCTAGCTGGGAAGGTGACCACGTCCAC 

C77TAAACACGGGGC7TCCAAC77AGCTCACACC7CACCAA7CACAGAGCTCAC7AAAA7GC7AAT7ACGCA 

AAGACAGGAGG7AAAGAAATAGCCAATCATCTAT7GCCTGAGAGCACAGCAGGACG0ACAACAATCGGGA 

TAAACCCAGGCATTCGAGCTGGCAACAGCAGCCCCCCT7TGGGTCCC7TCCCTTTG7A7GGGAGCTG777TC 

ATGCTATTTCACTCTATTAAATC77GCAAC7GCACTC77C7GGTCCA7GTTTCT7ACGGCTCGAGC7GAGCT 

TTTGC7CACCGTCCACCACTGCTGTT7GCCACCACCGCAGACC7GCCGC7GAC7CCCATCCCTC7GCATCCT 

GCAGGGTGTCCGC7GTGCTCCTGATCCAGCGAC^X:GCCCA7TGCCGC7CCCAA7TGGGC7AAAGGC77GCCA 

TTGT7CCTGCACGGG7AAGTGCCTGGGT7TGT7C7AA7TGAGC7GAACACTAGTCACTGGG7TCCATGG77C 

TC7TCTGTGACCCACGGCTTC7AATAGAACTA7AACACTTACCACATGGCCCAAGA77CCAT7CCT7GGAA7 

CCC 7G AGGCC AAG AAC7CCAGC7C\GAGAA7 ACGACGC77GCCACC A7CTTGGAAGC CCCC 7GC7 AC CA7C7 

TGG AAGTGCT7C ACC ACCATCT7GGGAGC7C 7G7C AGC AAGG ACC C Z CCGG7 AAC A 777TGGCAACCACG AA 

CGGACATCCAAAG7GGTGAG7AA7A7TGGACCACT7TCACT7GCTATTC7GTCCTATCC7TCCT7AGAA77G 

GAGGAAAATACCGGGCACTTG7CGGCCAG7TAAAAACGATTAG7G7GCCCACCGGACT7AAGACTCAGG7GT 

GAGGCTATCTGGGGAAGGGCTT7CTAACAACCCCCAACCCT7C7GGGTTGGGGAC7TGG7T7GCC7CAAGCC 

AGCT7CCAC77TGAG7TTTC77GGGGAAGCCGAGCGCCGACTAGAGGCAGAAAGC7C7CG7CC7GAAC7CC7 

GGCAGTAGCCGG7TGAGA7C\7GGTG7AGCCAGAAGTC7CAACAGTCGCCCATGCA7GCACCCC7A7C777C 

CTTCTGACCCATACCTCC7GGGTCCCAACCACAAC7TTC7TCAAAG7GTAGCCCCAAAATTCTCC77ACC7C 

TGAATATAC7TCC7CTGATCCC7GCCTCCTAGG7ACTATTGG7TCAGACTTCCA7TTCC7CTAGCAAi7rrG7 

ATCrCCAAAGGGATCTAAGGAAGCTCTGCGC7GCG7CCrrAGGCACCTAGGCTATAACCCAGGGAGTC77A7 

CCCTGGTGTCCCTCCCAA77TAGGCATACAGCTCTTGACA7GGGCAG7TATG7AGGACC^ 

CCT7GCCAGGGCCCCAAGTT7G7AAATGGC7GAC^AAAAGAGAGACAGAGGAGAGA^ 

GAAAGAGAGAGAGACAGAGAGGAGAGAGAGACAG7GAGAGAGACACAAGAGAGAGAGAGACAAAGAGGAGAG 

AGAGAGACTCAAAGAGAGAAAGAAAGAGAAACAAA7AGTAAAAAACAGTCTCCCC7 

GGG7AAATT7AAAACCTG7ACT7GATAAT7GAAGGTC7TC7C7G7GACCC7ATAGCAC7CCAATCCA 

TGGTCAGTG7AAATAAGAGCATAGGCCCAAAGCACTGAGGCCATTGACAACCCGTAGCT^ 

70(^7 AACCCAG7AACCCGCAGA7GGACCAAA7GCA77CAG7CGG7AGCGCAAC7GCTT7GC7AAAAG7AGA 

AAAG7AAC7T77AGAGGAAACC7CA77G7GAGCACACCTCACC7G77CAGAA77AT7C7AA7AAAAA^ 

AAAAGGTAGCTTACTAACTCAAAAATCrrAAAXITATGGGGCTA77CTG7TAGAAAA 

ACCACrGA7AA77CCCT7AACCCAGCAGA7T7CCrrAACGGGA7TTAAA7CTTAA7TACCATACAA^ 

ACCAGACCTAGC^GCAACTCCCTTCAGGACAGGACGATAGATGGTTCCTCCC^^ 

CACAATGGGTATTCAGTAATTGA7ACGGGGAGTCTTG7GGAAGGAGAGT7AGAAAAAT7^ 

TC7CCTCAAACGTCTGAGC717rrTGCACrCAGCCAAGCX77AAAG7ACTTACAGAATCAA^ 

ATCCTGATTCAAAAGCTTAGC7ACACCC7C7CrG7AATGCATTTGCATAACAACTTGT 

C7TGATGGGGCAGC7GG G 1 ! 7 G ; rATAAAATAGGAACCCAGCCCAGCTCTAGCA C7CACCCC7GACCGCAAAC 

GCAATGTTGGGCATGCTGGTAAAGGACCACTAGAATCCAGCAGCCCAGACCC C 1 1 "CT rTG7GG7CAAGAAA 

GGCGGGAAAAGGGG7GCAGGAC7GC7ACA7CGG7AAGCA7AAC7AATCCGATAAACAGAGG7CCA7GC^^ 

TTACGCACCCTGGAAAGGAACrCACCCCrGAGCACAAAGGCAATGTTGGGCACGCTGGTAAACGACCACTAG 

AATCCAGCAGCC7GGACCC C 7 Z T C : 71 G7GG7C AAGAGAGGC AGG AAAAC AGG7 GC AGGAC7GC AACA7C AG 

TGAGCATAAC7AAT7CGA7AAGCAGAGG7CC\7GGGTGG7GA7GCACCC7GCAAAGAA 7AAGCATTAGCACC 

A7ACAGGACACTCC^CK5ACrAAAGCrCATCGGAAAATGACTAGGG77 

AGA7GGGAAACG77CCCCGCAAGACAAAAACGCCCCTAAGACCrrA77C7GGAGAAT7GG^ 

CTCVGACACrAAGAAAGAAACCACrrATArTCTTCTGCAGTGCCG iXTGC^^ 

TA TAACACCA TCTTACAGCT AG A CCTC ff TIG TAGAAAAGGCAAA TGGAGTGAACTGCCA TAAGTACAAA CT 

rrcr tttca rr aa gagacaactcacaa tta totaaaaagtgtga ttta tgccctacaggaagccttcagagt 

C 7.AC CTCCC 7 A TCCCAGCA TCCCCGACTCCTTCCCCAAC7AATAAGGACCCCCCTTCA.\CCCAA.\ 7GG7-CA 
AAAGGAGA TAGACAAAAGGGTAAACAGTGAACCAAAGAGTGCCAA TA TTCCCCAATTA TGACCCCTCCAAGC 
AGTGGGAGGAAGAGAA TTCGGCCCAGCCAGAGTGCA TGTGCC TTTT TCTCTCCCAGACTTAAAGCAAA TAAA 
AACAGACTTAGGTAAA TTCTCAGA TAACCCTGA TGGCTA TA TTGA TOTi' 1 1 .ACAA GGGTTAGGACAA 7TCT7 
TGA TCTGACA TGGAGAGATATAATGTCACTGCTAAATCAGACACTAACCCCAAATGAGAGAAGTGCCACCA T 
AACTGCAGCCTGAGAGTTTGGCGA TCTCTGGTA TCTCAGTCAGGTCAATGATAGCATGACAACAGAGGAAAG 
AGAATGATTCCCCACAGGCCAGCAGGCAGTTCGCAGTCTAGACCCTCATTGGGACACAGAATCAGA^ 
AGATTGGTGCTGCAGACA TTTGCTAAC TTGTG r G CTAGAAGGACTAAGGAAAACTAGGAAGAAGTCTA TGAA 
TTACTCAA TGA TGTCCACCA TAACACAGGGAAGGGAAGAAAA TCCTACTGCCTTTCTGGAGAGACTAAGGGA 
GGCATTGAGGAAGCGTG C CTC TCTGTC A CCTGACTC T T C TG A AGGCCAACTAATCTTAAAGCGTAAGTTTA7 
CACTCAGTCAGCTGCAGACATTAGAAAAAAACTTCAAAAGTCTGCttTX 

CCTA TTGAACTTGGCAACC TCGG I riTTTA TAATAGAGA TCAGGAGGAGCAGGCGGAACAGGACAAACGGGA 
TTAAAAAAAAGGCCACCGCTT7AGTCA TGACCCTCAGGCAAGTGGACTTTGGAGGCTCTGGAAAAGGGAAAA 
GCTGGGCAAATTGAA TGCCTAA TAGGGL I LGCl TCCAGTGCGGTCTACAAGGACACTTTAAAAAAGATTGTC 
CAAGTAGAAGTAAGCCGCCCCCTCGTCCA TGCCCC TTA TTTCAAGGGAA TCA CTGGAAGGCCCACTGCCC CA 
GGGGACAAAGGTCCTC7GAGTCAGAAGCCACTAACCAGA TGA TCCAGCAGCAGGACTGAGGGTGCCTGGGGC 
AACCGCCATCCCATGCCATCACCCrCAGACAGCCCTGGG7ATGCrTGACCATTC 

CCTGGACACTGGT^CGGTCrrCTTACTCTTAC7CTTCTG7CCGGGACAACTG7CCTCCAGA7C7GTCA^C7A7 
CTGAGGGGGTCCTAAGACGGCCAGTCACTAGATACTTCTCCCAGCCACTAAGTTATGACTGGGGAGCT7TAT 
TCTTTTCAC A7GC T T T TC7AA77 A7GC77G AAAGCCCC AC7 ACCTTGTT AGGGAGAG ACA77C7 AGCAAAAG 
CAGGGGCCA77A7ACACCTGAACA7AGGAGAAGGAACACCCGT77G77G7CCCC7GC77GAGGAAGGAA77A 
ATCC7GAAG7CTCK^AAC^GAAC^ACAA7A7GGACGAGCAAAGAA7GCCCGTCC7 

AGGA77CCACC7CC777CCC7ACCAAAGGCAG7ACCCCC7CAGACCCAAGGCCCAACAAGGAC7CCAAAAGA 
77GT7AAGGACCTAAAAGCCCAAGGCCTAG7AAAACCA7GCAG7AACCCC7GCAG7AC7CCAA7777AGGAG 
7ACAGAAACCCAACAGACAG7GGAGG77AG7GCAAGA7C7CAGGA77A7CAA7GAGGCTCr: tCl TCCTCTA7 
AGCCAGCTG7ACC7AGCCC77A7ACTCTGCTTTCCCAAATACCAGAGGAAGCAGAG7GG77TACAG7CCTGG 
ACCT7CAGGATGCCrTCTTC7GCATCCCTGTACATCCTGACTCTCAAT7CTTGT7TGCCT77GAAGA7AC77 
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CAAACCCAACATCTCAACTCACCTGGACTATTTTACCCCAAGGCTTCACGCATACTCCCCATCTA7TTCCCC 

AGGCA7TACCCCAACACTTGACCCAATCCTCATACCTGGACACTTCTCCTTCGGTAGG7GCA7GA7TTACTT 

TTGGCCGCCCA7TCAGAAACC7TCTCCCATCAAGCCACCCAAGCGCTCTTCAATTTCCTCGCTACC7G7GGC 

TACATGGTTTCCAAACCAAACGCTCAACTCTGCTCACAGCAGGTTACTTAGGGCTAAAA7TATCCAAAGGCA 

CCACGGCCCTCAGTGAGGAACACATCCAGCCTATACTGGCTTATCCTCATCCCAAAACCCTAAAGCAACTAA 

GGGGATTCCTTGGCGTAATAGGTTTCTGCCGAAAATGOATTCCCAGGTATGGCGAAATAGCCAGG7CA77AA 

ATACACTAATTAAGGAAACTCAGAAAGCCAATACCCATTTAGTAAGATGGACAAC7GAAGTAGAAGTGGCTT 

7CCAGGCCCTAACCCAAGCCCCAGTGTTAAGTTTGCCAACAGGGCAAGACTTTTCTTCATATGTCACAGAAA 

AAACAGGAATAGCTCTACGAGTCCTTACACAGATCCGAGGGATGAGC7TCCAACCTCTGGCA7ACC7GAC7A 

AGGAAATTGATG7AG7GGCAAAGGG77GACCTCATTG7T7ACGGQ7AGTGGTGGCAG7AGCAGTC7TAG7AT 

CTGAAGCAGTTAAAATAATACAGGGAAGAGATCTTACTGTGTGGACATC7CATGATGTGAATCGCATACTCA 

C7GCTAAAGGACACTTGTGGCTGTCACACAACTCTTTACTTAAATGTCAGGCTCTAT7ACTTGAAGGGCCAG 

7GC7GCGAC7G7GCACrTGTGCAACTC7TAACCCAGCCACATT7C7TCCAGACAATGAAGAAA-AGA7AAAAC 

A7AAC7G7CAACAAG7AA7T7CTCAAACCTATGCCAC7CGAGGGGACCT7T7AGAGG77CC7T7GACTGA7C 

CCGACCTCAAC7TGTATAC7GA7GGAAGTTCCTTTG7AGAAAAAGGACT7CGAAAAG7GGGG7ATGCAGTGG 

7CAG7GATAA7GGAATACTTGAAAGTAATCCCCTCACTCCAGGAAC7AG7GC7CAGC7AGCAGAACTAATAG 

CCCTCACTTGGGCACTAGAA77AGGAGAAGAAAAAAGGGCAAA7ATA7A7ACAGAC7C7AAATA7GC77ACC 

7AG7CCTCCATGCCCATGCAGCAATATGGAAAGAAAGGGAATTCC7AAC7TC7GAGAGAACACCTA7CAAAC 

A7CAGGAAGCCATTAGGAAAtTA7TATTGGCTGTACAGAAACCTAAAGAGGTGGCAGTCTTACAC7GCCGGG 

G7CA7CAGAAAGGAAAGGAAAGGGAAA7AGAAGAGAACTGCCAAGCAGATAT7GAAGCCAAAAGAGC7GCAA 

GGCAGGACCCTCCATTAGAAATCC7TA7AAAACAACCCCTAGTATAGGGTAATCCCC7CCGGGAAACCAAGC 

CCCAG7ACTCAGCAGGAGAAACAGAA7GGGGAACCTCACGAGGACAG7T7TC7CCCC7CGGGACGGC7AGCC 

ACTGAAGAAGGGAAAATACTT77GCCTGCAAC7ATCCAATGGAAArrACTTAAAACCCTTCATCAAACCTT7 

CAC7TAGGCATCGATAGCACCCATCAGATGGCCAAA7CATTATTTACTGGACCAGGCC7TTTCAAAACTATC 

AAGCAGATAG7CAGGGCCTG7GAAGTGTGCCAGAGAAATAATCCCCTGCC TTA7CGCCAAGCTCCT7CAGGA 

GAACAAAGAACAGGCCA 77ACCC7GGAGAAGAC7GGCAAC7GA TTTTA CCCACAAGCCCAAA CC 7C AGGGA 7 

TTCAGTA TCTAC7AG7CTGGGTAGA TACTTTCACGGCTTGGGCA GAGGCC77CCCC7G TA GO A CA GAAAAGG 

CCCAA GAGG TAA TAAAGGCACTAGTTCATGAAATAATTCCCAGA 77CGGAC77CCCCGAGGCTTACAGAG7G 

ACAA TA GCCC ZGCTTTCCA GGCCA CAG 7AA CC CA GGGAGTA TCCCAGGCG77AGGTA TACGA TA TCACTTAC 

ACTGCGCCTGAAGGCCACAGTCCTCAGGGAAGGTCGAGAAAA TGAA TGAAACACTCAA.\GGACA 7C7AAAAA 

AGCAAACCCAGGAAACCCACC7CACA TGGCCTGCTCTGTTGCCTA 7 AGCC77 AAAAA GA.A TC7GCAAC7TTC 

CCCAAAAAGCAGGACTTAGCCCATACGAAA TGCTGTA TGGAAGGCCC7TCATAACCAA7GACC77G7GCT7G 

ACCCAAGACAGCCAACTTAGTTGCAGACA TCACCTCCTTAGCCAAA TA TCAACAAGTTCTTAA.AACA 77 AC A 

AGGAACCTA TCCCTGAGAAGAGGCAAAAGAACTA T7CCACCCTTGTGACA 7GG7A77AGTC AAG7CCC7TCC 

CTCTAA TTCCCCA TCCCTAGA 7 AC A TCCTGGGAAGGACCC7ACCCAG7CA TTTTA TC7ACCC CAACTGCGGT 

TAAAGTGGC7GGAGTGGAG7C77GGA 7 AC A 7CACAC7TGAGTCAAA 7CC7GGA 7AC7GCCAAAGGAACCTGA 

AAA TCCAGGAGACAACGC7AGCTA TTCCTG7GAACC7CTAGAGGA T7TGCGCCTGC7C7TCAAACAACAA CC 

AGGAGGAAAG7AAC7AAAA7CATAAATCCCCATGGCCCTCCC7TA7CA7A77777C7CTTTA 

ACCCTCTT7CACTCTCACTGCACCCCCTCCATGCCGCTGTATGACCAGTAGCTCCCC77ACCAAGAGT77C7 

A TGGAGAA TGCACCGTCCCGGAAA TA TTGA TGCCCCA TCGTA TAGGAG7CTTTCTAAGGGAACCC CCACCT7 

CACTGCCCACACCCA TA TGCCCCGCAACTGCTA TCAC7CTGCCA C7C7T7GCA 7GCA 7GCAAA 7AC7CA 77 A 

T7GGACAGGAAAAA TGA TTAA TCC7AG77GTCCTGGAGGACTTGGAGTCACTGTCTGTTGGA CT7AC77CA C 

CCAAACTGGTATGTCTGA TGGGGGTGGAGTTCAAGA TCAGGCAAGAGAAAAACA TGTAAAAGAAG7AA 7C7C 

CCAAC7CACCCGGG TA CA 7GGCA CC7C7AGCCCC7A CAAAGGAC7AGA TCTCTCAAAACTACA TGAAACCC7 

CCG7ACCCA 7 A CTCGCC7GG 7AA GCC7A 777 AA 7ACCACCC7CA C7GGGC7CCA 7GAGG7CTC GGCCCAAAA 

CCC7ACTAA C7GT7GGA TA 7GCC7CCCCC7GAAC77CAGGCCA TA TGTT7CAA TCCC7GTACC TG AACAA TG 

GAACAACT7CAGCACAGAAA7AAACACCAC77CCG7777AGTAGGACCTCT7G77 7CCAATCTGGAAATAAC 

CCA 7ACC7CAAACC7CACC7G7G7AAAA 777AGCAA 7AC7ACA 7ACACAACCAAC 7CCCAA 7GCA 7CAGG i G 

GG7AAC7CC7CCCACACAAA TAGTCTGCCTACCC TCAGGAA TATTTTTTGTCTGTGGTACCTCAGC^ - A TCG 

TTGTTTGAA TGG CTCTTCAGAA TCTATGTG CTTCCTCTCA TTCTTA GTGCCCCCTA TGACCATCTACAC7GA 

ACAAGA TTTA TACAGTTA TG7CA TA TCTAAGCCCCGCAACAAAAGA GTACCCA TTCTTC^CTTT'TGTTA TAGG 

AGCAGGAGTGC TAGGTGCA CTAGGTACTGGCA TTGGCGGTA TCA CAACCTCTACTCAGTTCTA CTACAAACT 

ATCTCAAGAACTAAATGGGGACATGGAACCCXTCGCCGACT CCCTGGTCACCT^ 

CCTAGCAGCAGTAGTCCTTCAAAATCGAAGAGCTTTAqACTTGC^ ifl'i V 

A TTTTTAGGGGAAGAA TGCTGTTA TTA TGTTAA TCAA TCCGGAA TCQTCACTQAGAAAQTT^A^GA^ ^^"Cr 

AGA 7CGAA 7 A CAA CGTAGAGCAGAGGAGCT7CGAAACACTGGA CCCTGGGGCC7CC7CAGCCA A 7GQA TQCC 

CTGGA 7TC7CCCC77C7TAGGA CC7C7AGCAGCTA TAA 7 A TTGC7A C7CCTCTTTGGACCCTG7A 7C777A.\ 

CC7CC77G77AA C7T7G7C7C77CCAGAA TCGAAGC7G7AAAAC7ACAAA TGGAGCCCAAGA 7GC AGTCCAA 

GAC7AAGA 7C 7 A CCGCA GA CCCC7GG A CCGGCCTGCTAGCCCACGA 7C7GA 7G7TAA 7GACA 7CAAACGCAC 

CCCTCCTGAGGAAA TCTCA GCTGCACAACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTT AG AGC GGTC 

Tr fWf* AAgCTCCC CAAC AGC ACTT AGGTTTTCCTGTTG AGATGGGGG ACTG AG AG AC A GG ACT AGCTGG AT 

TTCCTAGGCTGACTAAGAATCCCTAAGCCTAGCTGGGAAGGTGACCACATCCA CCTTTAAACACGGGGCTTG 

CAACTTAGCTCACACCTGACCAATCAGAGAGCTCACTAAAATGC7AATTAGGCAAAGACAGGAGG7AAAGAA 

ATAGCCAATCATCTATTGCCTGAGAGCACAGCAGGAGGGACAATGATCGGGATATAAACCCA AGTCTTCGAG 

CCGGCAACC^CAACCCCC7TTGGGTCCCCTCCCTrrGTATGGGAGCTCTGTTTTCA7GCTATTTCACTCTA7 

TAAATCTTGCAACTGCACTCTTCTGGTCCATGTTTCTTACGGCTTGAGCTGAGCTTTCGCTC GCCATCCACC 

ACTGCTGTTTGCCGCCACCGCAGACCCGCCGCTGACTCCCATCCCTCTGGATCATGCAGGGTGT CCGCTGTG 

CTCCTGATCCAGCGAGGCACCCATTGCCGCTCCCAATCGGGCTAAAGGCTTGCCATTGTTCCTG CATGGCTA 

AGTGCCTGGGTTCATCCTAATTGAGCTGAACACTAG7CACTGGGTTCCATGGTTCTCTTCTGTGACCCACAG 

CTTCTAATAGAGCTATAACACTCACCGCATGGCCCAAGGTTCCATTCCTTGAATCCATAAGG CCAAGAACCC 

CAGGTCAGAGAACACGAGGCTTGCCACCATCTTGGCAGCT CTGTGAGCAAGGACCCCCAAGTAACACAACCA 

TCAGGGTGCAAATGCATGGGCCACTAATGGTAGAGCAAGAAAACAGAAGGGCCCTGGTTCCTCGAAGGCATC 

AGTGAGCTGAAATGCCTGCCCTGGATGTCCTArrCCTAGGTGTTTrrCTGCCTGAAGCAGATTAAACCCTTT 

GTTCACTTCTCCAAGTAGGGCTTCTATTACAGCCCAAATCAATCCCCACCCCAGATGACAT 

FIGURE 1.2 



5255 

5 227 

5399 
5471 
5543 
5515 
553T 
575? 
5531 
5903 

50s 7 
5119 
5191 
5253 
5335 
540* 
5479 
555 1 
5523 
5595 
5*5* 
5339 
6911 
6933 
7055 
7127 
7199 
7?7 1 
7343 
7415 
7437 
7559 
7531 
77C2 
7775 
7847 
7919 
7991 
30c3 
8135 
32G7 
3279 
83S1 domain 
3423 

8557 

3539 

3711 

8733 

8353 

3927 

3999 . 

90 71 
. 9143 

$215 

9.237 

9359 

9431 

9503 

9S75 

9647 

9719 

9791 

9853 

9935 
10007 
10079 
10151 
10223 
10295 
10357 
10439 
10500 



err- 



repeat 
region 

Rl 



3/15 



4.1 



4.2 



FIGURE 2 



4/15 



ACTGAGAGACAGGACTAGCTGGATTTCCTAGGCCGACTAAGAATCCCTAAGCCTAGCTGGGAAGGTGACC 



ACTGAGAGACAGGACTAGCTGGATTTCCTAGGCTGACTAAGAATCCCTAAGCCTAGCTGGGAAGGTGACC 
ACGTCCACCTTTAAACACGGGGCTTGCAACTTAGCTCACACCTGACCAATCAGAGAGCTCACTAAAATGC 

AC ATCC AC CT T T AAAC AC GGGGC T T GC AACTT AGC T C AC AC CT G AC C AAT C AG AG AGC TC AC T AAAAT GC 
T AATT AGGC AAAG AC AGG AGGT AAAG AAAT AGC C AAT C AT C T AT T GC C T G AG AGC AC AGC AGG AGGG AC A 



T AATT AGGC AAAG AC AGG AGGT AAAG AAAT AGC C AATC AT C T AT T GC CTG AG AGC AC AGC AG GAGGG AC A 
ACAATCGGGATATAAACCCAGGCATTCGAGCTGGCAACAGCAGCCCCCCTTTGGGTCCCTTCCCTTTGTA 

ATGATCGGGATATAAACCCAAGTCTTCGAGCCGGCAACGGCAACCCCC-TTTGGGTCCCCTCCCTTTGTA 
TGGGAGCT — GTTTTCATGCTATTTCACTCTATTAAATCTTGCAACTGCACTCTTCTGGTCCATGTTTCT 



TGGGAGCTCTGTTTTCATGCTATTTCACTCTATTAAATCTTGCAACTGCACTCTTCTGGTCCATGTTTCT 
TACGGCTCGAGCTGAGCTTTTGCTCACCGTCCACCACTGCTGTTTGCCACCACCGCAGACCTGCCGCTGA 



TACGGCTTGAGCTGAGCTTTCGCTCGCCATCCACCACTGCTGTTTGCCGCCACCGCAGACCCGCCGCTGA 
CTCCCATCCCTCTGGATCCTGCAGGGTGTCCGCTGTGCTCCTGATCCAGCGAGGCGCCCATTGCCGCTCC 

CTCCCATCCCTCTGGATCATGCAGGGTGTCCGCTGTGCTCCTGATCCAGCGAGGCACCCATTGCCGCTCC 
CAATTGGGCTAAAGGCTTGCCATTGTTCCTGCACGGCTAAGTGCCTGGGTTTGTTCTAATTGAGCTGAAC 



CAATCGGGCTAAAGGCTTGCCATTGTTCCTGCATGGCTAAGTGCCTGGGTTCATCCTAATTGAGCTGAAC 
ACTAGTCACTGGGTTCCATGGTTCTCTTCTGTGACCCACGGCTTCTAATAGAACTATAACACTTACCACA 



ACTAGTCACTGGGTTCCATGGTTCTCTTCTGTGACCCACAGCTTCTAATAGAGCTATAACACTCACCGCA 
TGGCCCAAGATTCCATTCCTTGGAATCCGTGAGGCCAAGAACTCCAGGTCAGAGAATACGAGGCTTGCCA 

TGGCC C AAGGTTCC ATTC C TTG - AAT C C AT AAGGCC AAG AAC C CC AGGT C AG AGAAC AC G AGGC T T GC C A 
CCATCTTGGAAGC 



CCATCTTGGGAGC 
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I PMALPYHI FLFTVLLPS FTLTAPPPCRCMTSSSPYQEFLWRMQRPGNI DAPS YRSLSKG 
TPTFTAHTHMPRNCYHSATLCMHANTHYWTGKMINPSCPGGLGVTVCWTYFTQTGMSDGG 
GVQDQAREKHVKEVISQLTRVHGTSSPYKGLDLSKLHETLRTHTRLVSLFNTTLTGLHEV 
SAQNPTNCWICLPLNFRPYVSIPVPEQWNNFSTEINTTSVLVGPLVSNLEITHTSNLTCV 
KFSNTTYTTNSQCIRWVTPPTQIVCLPSGIFFVCGTSAYRCLNGSSESMCFLS FLVPPMT 
IYTEQDLYSYVISKPRNKRVPILPFVIGAGVLGALGTGIGGITTSTQFYYKLSQELNGDM 
ERVADSLVTIjQDQLNSIAAVVIiQNKRAI^ 

ABELRNTGPWGIiLSQWMPWIIiPFLGPIAAI I LliLLFGPC I FNIJjVNFVS SRIE AVKLQME PKMQSKTKI Y 
RRPLDRPASPRSDV NDIKGTPPEEISAAQPLLRPNSAGSS 



FIGURE 4 



1) NSLAAWLQNRRALDLLTAESGGTFLFLEEKC 

2 ) NSLAAWLQNRRALDLLTAERGGTCLFLGEEC 

3 ) DSLAAVTLQNHQGLDLLTAEKGGLC YFLGEDC 

4 ) DSLAAVTLQNHQGLDLLIAEKGGLCTFLGEEC 

5 ) DSLAAVTLQNCRGLDLLTAEKGGHYTFLGEEC 

6) LQNRRGLDLLFLKEGGLC 

7 ) DSLAKWLQNRRGLDLLTAEQGGICLALQEKC 



FIGURE 5 



TS FVEKANGVKCHKYKLS FHXETTHNYVKS VI YALQE AFRVYLP I LP AS PT P S PTNKD P P S TQMVQKE I DKRVNS 

EPKSANIPQLXPLQAVGGREFGPARVHVPFSLPDLKQIKTDLGKFSDNPDGYIDVLQGIjGQFFDLTWRDIMSLLiN 

QTLTPNERSATITAAXEFGDLWYLSQViroRMTTEEREXFPTGQQAVPSLDPHWDTESEHGDWCCRHLLTCVLiEGL 

RKTRKKSMNYSMMSTITQGREENPTAFLERLREALR^^ 

EQNLETLLNIiATSVFYTtfRDQEEQAEQDKRDXKK^^ 

RLSKXKXAAPSSMPLISRESLEGPLPQGTKVLXVRSHXPD/SSSRT 
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CCTGGCACTCCTGAGGGAAGTATAAATTATAACACCATCTTACAGCTAGACCTCTTTTGTAGAAAAGGCA 



CCTGGC-CTCCTGAGGGAAGTATAAATTATAACACCATCTTACAGCTAGACCTCTTTTGTAGAAAAGAAG 
-CAAATGGAGTGAAGTGCCATAAGTACAAACTTTCTTTTCATTAAGAGACAACTCACAATTATGTAAAAA 

GCAAATGGAGTGAAGTGCCATATGTACAAACTTTCTTTTCATTAAGAGATAACTCCCAATTATGTAAAAA 
GTGTGATTTATGCCCTACAGGAAGCCTTCAGAGTCTACCTCCCTATCCCAGCAT — CCCCGACTCCTTCC 

GTGTGATTTATGCCCTACAGGAAGCCCTCAGAGTCTACCTCCCGACCCCAGCAAGACCCCAACTCCTTCT 
CCAACTAATAAGGACCCCCCTTCAACCCAAATGGTCCAAAAGGAGATAGACAAAAGGGTAAACAGTGAAC 

CCAACTAATAAGGACCCCCCTTCAACCCAAATGGTCCAAAAGGAGATAGACAAAGGGGTAAACAATGAAC 
CAAAGAGTGCCAATATTCCCCAATTATGACCC-CTCCAAGCAGTGGGAGGAAGAGAATTCGGCCCAGCCA 



CAAAGAGTGCCAATATTACACGATTAT-ACTCGCTCCAAGCAGTGGGAGGA-GA-ATTT-GGCCCAGCCA 
GAGTGCATGTGCCTTTTTCTCTCCCAGACTTAAAGCAAATAAAAACAGACTTAGGTAAATTCTCAGATAA 

GCGTGCATGTACCTTTTTCTCTCTCAGATTTAAAGCAAATTAAAATAGACCTAGGTAAATTCTCAGATAA 
CCCTGATGGCTATATTGATGTTTTACAAGGGTTAGGACAATTCTTTGATCTGACATGGAGAGATATAATG 



CCCTGATGGCTATATTGATGTTTTACAAGGGTTAGGACAATCCTTTGATCTGACATGGAGAGATATAATG 
T C AC T GCT AAAT C AGAC AC T AACC CC AAAT G AGAGAAGTGCC ACC AT AACTGC AGC C T G AG AGT T TGGC G 

TTACTGCTAAATCAGACACTAACCCCAAATGAAAAAAGTGCTGCCATAACAGCAGCCTGAGAGTTTGGCG 
ATCTCTGGTATCTCAGTCAGGTCAATGATAGGATGACAACAGAGGAAAGAGAATGATTCCCCACAGGCCA 



AACTCTGGTATCTCAGTCAGGTCAATGATAGGATGACAACAGATGAAAGAGAATGATTCCCCACAGGCCA 
GCAGGCAGTTCCCAGTCTAGACCCTCATTGGGACACAGAATCAGAACATGGAGATTGGTGCTGCAGACAT 

GC AGGC AGTTC C C AGTGT AG AC CCT C AT T AGGAC AC AG AAT C AG AAC T T GG AGATT GGT G C C AC AG AC AT 
TTGCTAACTTGTGTGCTAGAAGGACTAAGGAAAACTAGGAAGAAGTCTATGAATTACTCAATGATGTCCA 



TT GCT AACTTGC GTGCT AGAAGG AC T AAGG AAAAC T AGG AAG AAGC C CAT G AAT TAT T C AAT G AT GT C C C 
CC AT AAC AC AGGGAAGGGAAG AAAATC CT AC T GCC TT T CTGG AG AG AC T AAGGGAGGC ATT G AGG AAGC G 

C T AT AAC AC AGGG AAAGG AAG AAAATCC T ACT GCC TTTC T GG AG AG ACT AAGGG AAGG ATT G AGG AAGC A 
TGCCTCTCTGTCACCTGACTCTTCTGAAGGCCAACTAATCTTAAAGCGTAAGTTTATCACTCAGTCAGCT 

TACCTCCCTGTCACCTGACTCTATTAAAGGCCAACTAATCTTAAAGGATAAGTTTATCACTCAGTCAGCT 
GC AG AC ATT AGAAAAAAACTTC AAAAGT C T GC CGT AGGC C CGG AGC AAAACTT AG AAAC C CT AT T G AAC T 

GC AG AG AT T AAGAAAAAACTT C AAAAGT AT GC CTT AGGC CC AG AGC AAAACTT AG AAAC CC T AC T G AAC T 
TGGCAACCTCGGTTTTTTATAATAGAGATCAGGAGGAGCAGGCGGAACAGGACAAACGGGATTAAAAAAA 

TGGC AACCTCAGTTTTTTATAATAGAGATCAGGAAGAGC AGG- GGAATGGGACAAATGGGATAAAAAAAA 
A GGCCACCGCTTTAGTCATGACCCTCAGGCAAGTGGACTTTGGAGGCTCTGGAAAAGGGAAAA 

AAAAAAAAGGTGACTGCTTTAGTCGTGGCCCTCAGGCAAATGGACTTTGGAGGCTCCAGAAAAGGGAAAA 
GCTGGGCAAATTGAATGCCTAATAGGGCTTGCTTCCAGTGCGGTCTACAAGGACACTTTAAAAAAGATTG 



GCTGAGCAAATTGAATGCCTAACAGGGCTTGCTTCTAGTGTGGTCTACAAGGACACTTTAAAAAAGATTG 
TCCAAGTAGAAGTAAGCCGCCCCCTCGTCCATGCCCCTTATTTCAAGGGAATCACTGGAAGGCCCACTGC 

TCCAAGTAGAAACAAGCTGCCCCCTTGTCCATGCCCCTTATGTCAAGGGAATCACTGGAAGGCCCACTGC 
CCCAGGGGACAAAGGTCCTCTGAGTCAGAAGCCACTAACCAGATGATCCAGCAGCAGGACTGAGGGTGCC 

CC C AGGAGAT GAAGGT C C T CT G AGT C AGAAGC C AC T AACC AG AT AAT C C AGC AGC AGG AC TG AGG AT GC C 
TGGGGCAAGCGCCATCCCATGCCATCACCCTCACAGAGCCCTGGGTATGCTTGACCATTGAGGGCCAGGA 



CAGGGCAAGCGCCAGCCCATGCCATCACCCTCACAGAGCCTTGGGTATGCTTGACCATTGAGGGCCAGGA 
GGTT GTCTCCTGGACACTGGTGCGGTCTTCTTAGTCTTACTCTTCTGTCCCGGACAACTGTCCTCC 

GGTTCACTGTCTCTTGGACACTGGTATGGCCTTCTCAGTCTTACTCTCCTGTCCTGGACAACTGTCCTTC 
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01/ TAAATCCCCATGGCCCTCCCTTATCATATTTTTCT 

02 / TAAATCCCC-TGGCCCTCCCTTATCATATTTTTCT 

03/ TAAATCCCCATGGCCCTCCCTTATCATATTTTTCT 

04/ TAGATCCTCATGGCCCTCC-TTGTCATATTTTTTT 



01/ CTTTACTGTTCTTTTA-CCCTCTTTCACTCTCACTGCACCCCCTCCATGCCGCTGTATGACC 
02/CTTTACTGTTCTCTTACCCCCCTTTCACTCTCACTGCACCCCGTCCATGCCACTGCACCCCC 
03/CTTTACTGTTCTCTTA-CCCCCTTTCTCTCTCACTGCACCCCCTCCATGCTGCTGTACAACC 
04/CTTTACTGTTCTCTTA-CCCCCTTTCACTCTCACTGAACCCCCTCCATGCCACTGTACTACC 

0 1 / AGT AGCTCCCCTTACCAAGAGTTTCTATGGAGAATGCAGCGT 

02/GTCCATGCCCGTCTCATGCCAGTAGCTCCCCTTAGCAAGAGTTTCTATGGAGAATGCAGCGT 

03/AGC AGCTCCCCTTACCAAGAGTTTCTATGAAGAATGCGGCTT 

04 /AGT AGCTCCCATTACCAAGAGCTTCTATGGACAATGCGGCTT 

0 1 /CCCGGAAATATTGATGCCCCATCGTATAGGAGTCTTTCTAAGGGAACCCCCACCTTCACTGC 
02/CCCGGAAATATTGATGCCCCATTGTATAGGAGTTTATCTAAGGGAACCCCCACCTTCACTGC 
03/CCCAGAAATATTGATGCCCCATCAAATAGGAGTTTACCTAAAGGAAACTCCACCTTCACTGC 
04 / CCTGGAAATATTGATGACCCATCGTATAGGAGTTTTTCTAAAGGAAACCCCATTTTCACCAC 

01/ CCACACCCATATGCCCCGCAACTGCTATCACTCTGCCACTCTTTGCATGCATGCAAATACTC 
02/CCACACCCATATGCCCCACAACTGCTATAACTCTGCCACTCTTTGCATGCATGCAAATACTC 
03/ CCACACCCATATGCCCCACAACTGCTATAACTCTGCCACTCTTTGCATGCATGCAAATACTC 

0 4 /CCACACCTATATGACCC 

0 1 / ATTATTGGACAGGAAAAATGATTAATCCTAGTTGTCCTGGAGGACTTGGAGTCACTGTCTGT 

0 2 / ATTATTGGACAGGAAAAACGATTAATCCCAGTTGTCCTGGAGGACTTGGAG ■ 

03/ATTATTGGACAGGGAAAATGATTAATCCTAGTTGTCCTGGAAGACTTGGAGCCACTGTCTGT 
04/ 

01/TGGACTTACTTCACCCAAACTGGTATGTCTGATGGGGGTGGAGTTCAAGATCAGGCAAGAGA 
02/ — GACTCACTTCACTCATACCAGTATGTCTGATGGGGGTGGAGTTCAAGATCAGGCAACAGA 
03/CGGACTTACTTCACCCATACTGGTATGTCTGAGGGGGGTGGAGTTCAAGATCAGGCAAGAGA 
04/ 

0 1 / AAAACATGTAAAAGAAGTAATCTCCCAACTCACCCGGGTACATGGCACCTCTAGCCCCTACA 
0 2 / AAAACACATAAAGGAAGTAATCTCCCAACTGACCTGGGTACATAGCACCCCTGGCCCCTACA 
0 3 / AAAACATGTAAAGGAAGTAACCTCCCAACTGACCCGGGTACATAGCACCCCTAGCCCCTACA 
04/ 

01/AAGGACTAGATCTCTCAAAACTACATGAAACCCTCCGTACCCATACTCGCCTGGTAAGCCTA 

02 / AAGGACTAGATCTCTCAAAACTACATGAAACCCTCCATACCCATACTGGCCTGGTAAGCCTA 
03/ AAGGACTAGATCTCTTAAAACTACATGAAACCCTCCATACCCATACTTGCCTGGTAAGCCTA 
04/ 

01/TTTAATACCACCCTCACTGGGCTCCATGAGGTCTCGGCCCAAAACCCTACTAACTGTTGGAT 
02/ TTTAATACCACCCTGACTGGGCTCCATGAGGTCTCGGCCCAAAACCCTACTAACTGTTGGAT 

0 3 / TTTAATACCACCCTCACTGGGCTCCATGAGGTCTCGGTCCAAAACCCTACTAACTGTTGGTT 
04/ 

0 1 / ATGCCTCCCCCTGAACTTCAGGCCATATGTTTCAATCCCTGTACCTGAACAATGGAACAACT 

02 / GTGCCTCCCCCTGCACTTTAGGCCATACATTTCAATCCCTATACCTGAACAATGGAACAACT 
03/GTGCCTCCCCCTGTATTTCAGGCCATGCATTTCAATCCCTGTACCTGAACAATGGAACAACT 
04/ T G C AC T T C AG G CC AT AC AT T T C AAT CCC TGTA 
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0 1 /TCAGCACAGAAATAAACACCACTTCCGTTTTAGTAGGACCTCTTGTTTCCAATCTGGAAATA 

02/TCAGCACAGAAATAAACACCACTTCTGTTTTAGTAGGTCCTC TTTCCAATCTGGAAATA 

0 3 /ACAGCACAGA/^ATAAACACCACTTCCGTTTTAGTAGGACCTCTTGTTTCCAATCTGGAAATA 



0 1 / ACCCATACCTCAAACCTCACCTGTGTAAAATTTAGCAATACTACATACACAACCAACTCCCA 
02/ACCCATACCTCAAACCTCACCTGTGTAAAATTTAGCAATACTATAGACACAGCCAACTCCCA 

0 3 /ACCCATACCTCAAACCTCACCTGTGTAAAATTTAGCAATACTGTAGACACAACCAACTCCCA 
04/ : 

0 1 /ATGCATCAGGTGGGTAACTCCTCCCACACAAATAGTCTGCCTACCCTCAGGT^ATATTTTTTG 
02/ATGCATCAGGTGGGTAACTCCTCCCACACGAATAGTCTGCCTACCCTCAGGAATATTTTTTG 
03/ATGCATCAGGTGGGTAACTCCTCCCACACGAATAGTCTGCCTACCCTCAGGAATATTTTTTG 
04/ 

01/TCTGTGGTACCTCAGCCTATCGTTGTTTGAATGGCTCTTCAGAATCTATGTGCTTCCTCTCA 
02/TCTGTGGTACCTCAGCCTATCATTGTTTGAATGGCTCTTCAGAATCTGTGTGCTTCCTCTCA 
03/TCTGTGGTACGTTAGCCTATCGTTGTTTGAATGGCTCTTCAGAATCTATGTGCTTCCTCTCA 
04/ 

01/TTCTTAGTGCCCCCTATGACCATCTACACTGAACAAGATTTATACAGTTATGTCATATCTAA 
02/TTCTTAGTGGCCCCTATGCCCATCTACACTGAACAAGATTTATACAATCATGTCATACCTAA 
03/TTCTTAGTGCCCCC-ATGACCATTTACACTGAACAAGATTTATACAATTATGTTGTACCTAA 

04/ ~ 

01/GCCCCGCAACAAAAGAGTACCCATTCTTCCTTTTGTTATAGGAGCAGGAGTGCTAGGTGCAC 
02/GCCCCGCAACAAAAGAGTACCCATTCTTCCTTTTGTTATTGGAGCAGGAGTGCTAGGCGGAG 
03/GCCCCACAACAAAAGAGTACTCATTCTTCCTTTTGTTATCGGAGCAGGAGTGCTAGGTGGAC 

04/ 

01/TAGGTACTGGCATTGGCGGTATCACAACCTCTACTCAGTTCTACTACAAACTATCTCAAGAA 
02/TAGCTACTGGCATTGGCGGTATCACAACCTCTACTCAGTTCTACTACAAACTGTCTCAAGAA 
03/TAGGTTCTGGCATTGGCGGTACCACAACCTCTACTCAGTTCTACTACAAACTATCTC7VAGAA 
04/ 

01/CTAAATGGGGACATGGAACGGGTCGCCGACTCCCTGGTCACCTTGCAAGATCAACTTAACTC 
02/CTTAAAGGTGACATGGAATGGGTCGCTGATACCCTGGTCACCTTGCAAGATCAACTTAACTC 
03/CTCAATGGTGACATGGAATGGGTTGCCGACTCCCTGGTCACCTTGCAAGATCAACTTAACTT 

04/ 

01/CCTAGCAGCAGTAGTCCTTCAAAATCGAAGAGCTTTAGACTTGCTAACCGCTGAAAGAGGGG 
02/CCTAGCAGCAGTAGTCCTTCAAAATCGAAGAGCTTTAGACTTGCTAACCGCGGAAAGCGGGG 
03/CCTAGCATCAGTAGTCCTTCAAAATTGAAGAGCTTTAGACTTGCTAACCTCTGAAAGAGGGG 
04/ 

01/GAACCTGTTTATTTTTAGGGGAAGAATGCTGTTATTATGTT 

02/GAACCTTTTTATTTTTAGAGGAAAAATGCTGTTGTTATGTT 

03/GAAGCTGTTTATTTTTAGGGGAAGAATGTTGTTATTATGTTATTTTAGCGGAAGAATGTTGT 

04/ 

01/ AATCAATCCGGAATCGTCACTGAGAAAGTTAAAGAAATTCGAGATCGAATACA 

02/ AATCAATCCGGAATCATCACCGAGAAAGTTAAAGAAATTCAAGGTCGAATATA 

03/TATTATGTTAATCAATCCTGAATTGTCACAGAGAAAGTTGAAGAAATTCGAGATTGAATACA 
04/ 

. 0 1 /ACGTAGAGCAGAGGAGCTTCGAAA-CACTGGACCCTGGGGCCTCCTCAGCCAATGGATGCCCT 
0 2 /ACGTAGAGCAAAGGAGCTGCAAAA-CACTGGACCCTGGGGCCTCCTCAGCCAATGGATGCCCT 
0 3 / ACGTAGAACAGAGGAGCTTCAAAAACACCAGACCCTGGGGCCTCCTCAGCCAATGGATGCCCT 
04/ 
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Ol/GGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGCTACTCCTCTTTGGACCCTGTA 
02/GGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGTTACTCCTCTTTGGACCCTGTA 
03/GGATTCTCCCCTTCTTAGGATCTCTAGCAGCTCTAATATTGATACTCCTCTTTGGACCCTGTA 
04/ 

01/TCTTTAACCTC.CTTGTTAACTTTGTCTCTTCCAGAATCGAAGCTGTAAAACTA 

02/TCTTTAACCTCCTTGTTAAGTTTGTCTTTTCCAGAATCGAAGCAGTAAAACTACAAATCGTTC 
03/TCTTTAACCTCCTTGTTAAGTTTGTCTCTTCCAGAATCAAAGTTGTAAAGCTACAAATCGTTC 
04/TCTTTAACCTCCTTGTTAAGCTTGTCTCTTGCAGAATCGAAGCTGTAAAACTACAAATGCTTG 

01/ — CAAATGGAGCCCAAGATGCAGTCCAAGACTAAGATCTACCGCAGACCCCTGGACCGGCCTG 
02/TTCAAATGGAGCCCCAGATGCAGTCCATGAGTAA7VATCTACCACGGACCCCTGGACCGGCCTG 
03/TTCAAATGGAACCCCAGATGAAGTCCATGACTAAGATCTACCGTGGACCCCTGGACCGGCCTA 
04/TTAAAATAGAGCCCCAGATGCAGTCCATGGCTAAGATCTACCACGGACCCCTGGACCGGCCTG 

01/CTAGCCCACGATCTGATGTTAATGACATCAAAGGCACCCCTCCTGAGGAAATCTCAGCTGCAC 
02 / CTAGCCCATGCTCTGATGTTAATGACATCAAAGGCACCCCTCCCGAGGAAATCTCAACTGCAC 
03/CTAGCCCATGCTCCAATTGTAATGATATCGAACGCACCCCTCCCGAGGAAATCTCAACTGCAC 
04/CTAGCCCATGCTCTGATGTTGATGACATTGAAGGCACGGCTTCCGAGGAAATCTCAACTGCAC 

0 1 / AACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGCGGTCGTCGGCCAACCTCCCC 
02/AACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGTGGTTGTTGGCCAACCTCCCC 
0 3 / AACCCCTACTATGCCCCAATTCCGCAGGAAGCAGTTAGACTGGTCGTCAGCCAACCTCCCC 

04 / GACCCCTACTACACCCCAATTTAGCGGGAAGCAATTAGAGCAGCCTATGGCCACCTCCCC 
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CTTCCCCAACTAATAAGGACCCCCCTTTCAACCCAAACAGTCCAAAAGGACATAGACAAAGGA 3 

CTTCCCCAACTAATAAGGACCCCCCTTTCAACCCAAACAGTCCAAAAGGACATAGACAAAGGA 4 

C T T C C C C AAC T AAT AAG G ACC C C C C - T T C AAC C C AAAT G GT C C AAAAGG AG AT AG AC AAAAGG 5 

C T T C T C C AAC T AAT AAG G AC CCCCC-TT C AAC C C AAAT G G T C C AAAAGG AG AT AG AC AAAG GG 6 

C T TC C C C AAAT AAT AAG AAC CCCCC-TT C AAC C C AAAC G GT C C AAAAGG AG AT AG AC AAAG GG 7 

GTAAACAATGAACCAAAGAGTGCCAATATTCCCTGGTTATGCACCCTCCAAGCGGTGGGAG — 3 

GTAAACAATGAACCAAAGAGTGCCAATATTCCCTGGTTATGCACCCTCCAAGCGGTGGGAG — 4 

GTAAACAGTGAACCAAAGAGTGCCAATATTCCCCAATTATGACCCCTCCAAGCAGTGGGAGGA 5 

GTAAACAATGAACCAAAGAGTGCCAATATTACACGATTATACTCGCTCCAAGCAGTGGGAG — 6 

GTAAACAACTAACCAAAGAATGCCAATATTCCCCGATTATGCCCCCTCCAAGCGGTGGGAG — 7 

A-AGAATTCGGCCCAGCCAGAGTGCATGTACCTTTTTCTCTCTCAC-ACTTGAAGCAAATTAAA 3 

A-AGAATTCGGCCCAGCCAGAGTGCATGTACCTTTTTCTCTCTCAC-ACTTGAAGCAAATTAAA 4 

AGAGAATTCGGCCCAGCCAGAGTGCATGTGCCTTTTTCTCTCCCAG-ACTTAAAGCAAATAAAA 5 

-GAGAATTTGGCCCAGCCAGCGTGCATGTACCTTTTTCTCTCTCAG-ATTTAAAGCAAATTAAA 6 

-GAGAATTCGGCCCAGCCAGAGTGCACGTACCTTTTTCTCTCTCTAGACTTTAAA TTAAA 7 

ATAGACNTAGGTNAATTNTCAGATAGCCCTGATGGYTATATTGATGTTTTACAAGGATTAGGA 3 

ATAGACXTAGGTXAATTXTCAGATAGCCCTGATGGXTATATTGATGTTTTACAAGGATTAGGA 4 

ACAGACTTAGGTAAATTCTCAGATAACCCTGATGGCTATATTGATGTTTTACAAGGGTTAGGA 5 

AT AG AC C T AGGT AAAT T C T C AG AT AAC C C T GAT G G C TAT AT T GAT G T T T T AC AAG G G T T AGG A 6 

ATAGACCTAGGT AAAT TCTCAGAT AAC CCTAATGGC TAT ATT GAT GTTTTACAAGGTTTAGGA 7 

T TCC TGAGTTCTTGC ACT AACCTC AAAT 1 

CAATCCTTTGATCTGACATGGAGAGATATAATATTACTGCTAAATCAGACGCTAACCTCAAAT 3 

CAATCCTTTGATCTGACATGGAGAGATATAATATTACTGCTAAATCAGACGCTAACCTCAAAT 4 

C AAT T C T T T GAT C T G AC AT G GAG AG AT AT AAT G T C AC T G C T AAAT C AG AC AC T AAC C C C AAAT 5 

C AAT C C T T T GAT C T G AC AT G GAG AG AT AT AAT G T T AC T G C T AAAT C AG AC AC T AACC C C AAAT 6 

C AAT C C T T T GAT C T GAT AT G GAG AG AT AT AAT G T T AC T G C T AAAT C AG AC AC T AAC C C C AAAT 7 

GAGAGAAGTGCCGCCATAACTGCAACCCAAGAGTTTGGCGATCCCTGGTATCTCAGTCAGGTC 1 

GAGAGAAGTGCTGCCATAACTGGAGCCCGAGAGTTTGGCAATCTCTGGTATCTCAGTCAGGTC 3 

GAGAGAAGTGCTGCCATAACTGGAGCCCGAGAGTTTGGCAATCTCTGGTATCTCAGTCAGGTC 4 

GAGAGAAGTGCCACCATAACTGCAGCCTGAGAGTTTGGCGATCTCTGGTATCTCAGTCAGGTC 5 

GAAAAAAGTGCTGCCATAACAGCAGCCTGAGAGTTTGGCGAACTCTGGTATCTCAGTCAGGTC 6 

GACAGAAGTGTCGCCGTAACTGGAGCCCGAGAGTTTGGCAATCTCTGGTATCTCAGTCAGGTC 7 

AATGACAGGATGACAACAGAGGAAAGATAATGATTCCCCACAGGCCAGCAGGCAGTTCCCAGT 1 

AATGATAGGATGACAACGGAGGAAAGAGAACGATTCCCCACAGGGCAGCAGGCAGTTCCCAGT 3 

AATGATAGGATGACAACGGAGGAAAGAGAACGATTCCCCACAGGGCAGCAGGCAGTTCCCAGT 4 

AAT GAT AGG AT G AC AAC AG AG G AAAG AG AAT GAT T C C C C AC AGGC C AG C AG G C AG T T C C C AG T 5 

AAT GAT AGG AT G AC AAC AG AT G AAAG AG AAT GAT T C C C C AC AGGC C AG C AG GC AG T T C C C AG T 6 

AATGATAGGATGACAACAGAGGAAAGAGAACGATTCCCCACAGGCCAGCAGGCAGTTCCCAGT 7 

GTAGACCCTCATTAGGACACAGAATCAGAACATGGAGATTGGTGCCGCAGACATTTGCTAACT 1 

AACT 2 

GTAGCTCCTCATTGGGACACAGAATCAGAACATGGAGATTGGTGCCGCAGACATTTACTAACT 3 

GTAGCTCCTCATTGGGACACAGAATCAGAACATGGAGATTGGTGCCGCAGACATTT 4 

CTAGACCCTCATTGGGACACAGAATCAGAACATGGAGATTGGTGCTGCAGACATTTGCTAACT 5 

GTAGACCCTCATTAGGACACAGAATCAGAACTTGGAGATTGGTGCCACAGACATTTGCTAACT 6 

GTAGACCCTCACTGGGACACAGAATCAGAACATGGAGATTGGTGCCGCAGACATTTGCTAACT 7 
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TGCGTGCTAGAAGGACTAAGGAAAACTAGGAAGA TAT G AAT TAT T C AAT GAT G T C C AC T 1 

TGCGTGCTAGAAGGACTAAGGAAAACTAGGAAGA C TAT G AAT TAT T C AAT GAT G T C C AC T 2 

TGCGTGCTAGAAGGACTAAGGAAAACTAGGAAGA C TAT G AAT TAT T C AAT GAT G T CC AC T 3 

T G T GT GC T AG AAG G AC T AAG G AAAAC T AGG AAG AAG T C TAT G AAT T AC T C AAT GAT G T C C AC A 5 

TGCGTGCTAGAAGGACTAAGGAAAACTAGGAAGAAGCCCATGAATTATTCAATGATGTCCCCT 6 

TGCGTGCTAGAAGGACTAAGGAAAACTAGAAAGAAGCCTGTGAGTTATTCAATGATGTCCACT 7 

ATAACACAGGGGAAAGGAAGAAAATCCTACTGCCTTTCTGGAGAGACTAAGGGAGGCATTGAG 1 

ATAACACAGGGGAAAGGAAGAAAATCCTACTGCCTTTCTGGAGAGACTAAGGGAGGCATTGAG 2 

ATAACACAGGGGAAAGGAAGAAAATCCTACTGCCTTTCTGGAGAGACTAAGGGAGGCATTGAG 3 

ATAACACAGGG-AAGGGAAGAAAATCCTACTGCCTTTCTGGAGAGACTAAGGGAGGCATTGAG 5 

ATAACACAGGG-AAAGGAAGAAAATCCTACTGCCTTTCTGGAGAGACTAAGGGAAGGATTGAG 6 

ATAACACAGGG-AAAGGAAGAAAATCCTACCGCCTTTCTGGAGTGACTAACGGAGGCATTGAG 7 

GAAGCATACC AGGCAAGTGGACATTGGAGGCTCTGGAAAAGGGAAAAGTTGGGAAAAGTA 1 

GAAGCATACC AGGCAAGTGGACATTGGAGGCTCTGGAAAAGGGAAAAGTTGGGCAAATTG 2 

GAAGCATACC AGGCAAGTGGACATTGGAGGCTCTGGAAAAGGGAAAAGTTGGGCAAATTG 3 

GAAGCGTGCC232AGGCAAGTGGACTTTGGAGGCTCTGGAAAAGGGAAAAGCTGGGCAAATTG 5 

GAAGCATACC238AGGCAAATGGACTTTGGAGGCTCCAGAAAAGGGAAAAGCTGAGCAAATTG 6 

GAAGCATACC233AGGCAAGCGGACTTTGGAGGCACTGGAAAAGGGAAAAGCTAGGCAAATCA 7 

TATGTCTAATAGGGCTTGCTTCCAGTGTGGTCTACAAGGACACTTTAAAAAAGATTGTCC-AA 1 

AATGCCTAATAGGGCTTGCTTCCAGTGCAGTCTACAAGGACGCTTTAGAAAAGATTGTCC-AA 2 

AATGCCTAA 3 

AATGCCTAATAGGGCTTGCTTCCAGTGCGGTCTACAAGGACACTTTAAAAAAGATTGTCC-AA 5 

AATGCCTAACAGGGCTTGCTTCTAGTGTGGTCTACAAGGACACTTTAAAAAAGATTGTCC-AA 6 

■ AATGCCTAATAGGGTTTGCTTCCAGTGCGGTCTACAAGGACACTTTAAAAAAGATTGTCCAAA 7 

-TAGAAATAAGCCACCACCTCGTCCATGCCCCTTATGTCAAGGGAATCACTGGAAGGCCCACT 1 

GTAGAAATAAGCCGCCCC-TCGTCCATGCCCCTTATGTCAAGGGAATCACTGGAAGGCCTACT 2 

GTAGAAGTAAGCCGCCCCCTCGTCCATGCCCCTTATTTCAAGGGAATCACTGGAAGGCCCACT 5 

GTAGAAACAAGCTGCCCCCTTGTCCATGCCCCTTATGTCAAGGGAATCACTGGAAGGCCCACT 6 

-TAGAAATAAGCCGCCCCCTCGTCCATGCACCTCGTGTCAAGGGAATCACTGTAAGGCCCACT 7 

GCCCCAGGGGATGAAGGTCCTCTGAGTCAGAAGCCACTAACCAGATGA 1 

GCCCCAGGGGACGAAGGTCCTCTGAGTCAGAAGCCACTAACCTGATGA 2 

GCCCCAGGGGACAAAGGTCCTCTGAGTCAGAAGCCACTAACCAGATGA 5 

GCCCCAGGAGATGAAGGTCCTCTGAGTCAGAAGCCACTAACCAGATAA 6 

GCCCCAGGGGACGTAGGTCCTCTGAGTCAGAAGCCACTAACCAGATGA 7 
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RTPLSTQTVQKDIDKGVNNEPKSANIPWLCTLQAVGEEFGPARVHVPFSLSHLKQIKIDG SDSPDG 

KDPPSTQMVQKEIDKRVNSEPKSANIPQLPLQAVGGREFGPARVHVPFSLPDLKQIKTDLGKFSDNPDG 

YIDVLQGLGQSFDLTWRDIILLLNQTLTSNERSAAITGAREFGNLWYLSQVNDRMTTEERERFPTGQQ 

YIDVLQGLGQFFDLTWRDIMSLLNQTLTPNERSAT'ITAAXEFGDLWYLSQVNDRMTTEEREXFPTGQQ 

AVPSVAPHWDTESEHGDWCRRHLLTCVLEGLRKTRK TMNYSMMSTITQGK 

AVPSLDPHWDTESEHGDWCCRHLLTCVLEGLRKTRKKSMNYSMMSTITQGR 
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GTCTACCTAGCCA-AGGCATATTCTTCTTATGTGGAACATCAACCTATATCTGCCTCCCCACTAACTGGA 

GTCTGCCTACCCTCAGGAATATTTTTTGTCTGTGGTACCTCAGCCTATCGTTGTTTGA — A-TGGCTCTT 
CAGGCACC-TGAACCTTAGTCT — TTCTAAGTCCCAAC-ATTAACATTGCCCCAGGAAATCAGACCC-TA 

CAGAATCTATG^GC-TTCCTCTCATTCTTAGTGCCCCCTATGACCATCTACACTGAACA — AGATTTATA 
TTGGTACCTGTCAAAGCTAAAGTCCCGTCAGTGCAGAGCCATACAACTAATATCCCTAT-TTATAGGGTT 

CAGTTA — TGTCATATCTAA-GCCCCGCAACAAAAGAGT-ACCCAT-TC-T-TCCTTTTGTTATAGGAGC 
AGGAATGGCTAC-TGCTAC-AGGAACTGGAATAGCCGGTTTATCTACTTC-ATT-A-TCCTACTACCATA 

AGGAGTG-CTAGGTGC-ACTAGGTACTGGCATTGGCGGTATCACAACCTCTACTCAGTTCTACTACAA-A 
CACTCTCAAAGAATTTCTCAGACAGTTTGCAAGAAATAATGAAATCTATTCTTACTTTACAATCCC7VA-T 

CTATCTCAA-GAACTAAATGGGGACATGGAACGGGTCGCCGAC-TCCCTGGTCACCTTGCAAGATCAACT 
TAGACTCTTTGGCAGCAAT-GACTCTCCAAAACCGCCGAGGCCCACACCTCCTCACTGCTGAGAAAGGAG 

TA-ACTCCCTAGCAGCAGTAGTC-CTTCAAAATCGAAGAGCTTTAGACTTGCTAACCGCTGAAAGAGGGG 
GACTCTGCACCTTCTTAGGGGAAGAGTGTTGTTTTTACACTAACCAGTCAGGGATAGT-AC-GAGAT-GC 

GAACCTGTTTATTTTTAGGGGAAGAATGCTGTTATTATGTTAATCAATCCGG7VATCGTCACTGAGAAAGT 

CACCTGGCATTT-ACAGGAAAGGGCTTCTGATATCAGACAATGCCTTTCAAACTCTTATACCAA CCT 

: : : : : : : : : : : :::::: : : : : : :::::: : : : : : 

TAAA-GAAATTCGAGATCGAATA-CAACGTAGAGCAGAGGA-GC-TTCGAAACACTGGACCCTGGGGCCT 
CTGGAGT TGGGCAACATGGCTTCTTCCATTTCTAGGTCCCATGGCAGCCATCTTGCTGTTACTCACC 

CCTCAGCCAATGGATGCCCTGGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGCTACTCCTC 
TTTGGGCCCTGTATTTTTAAGCTTCTTGTCAAATTTGTTTCCTCTAGGATCGAAGCCATCAAGCTACAGA 

TTTGGACCCTGTATCTTTAACCTCCTTGTTAACTTTGTCTCTTCCAGAATCGAAGC — T G-TAAA-A 

TGGTCTTACAAATGGAACCCCAAATG-AGTTCAACTAACAACTTCTACCAAGGACCCCTGGAACGATCCA 

CT-ACAAATGGAGCCCAAGATGCAGTCCAAG-ACTAAGATCTACCGCAGACCCCTGGACCGGCCTG 

CTGGC — ACT-TCC-AC-T-A — GCC-T-AGAGATTCCCCTCTGGAAGACA-CTACAACTGCAGGGCCCC 

CTAGCCCACGATCTGATGTTAATGACATCAAAGGCACCCCTCCTGAGGAAATCT-CAGCTGCACAACCTC 
TTCTTTGCCCCTATCCAGCAGGAAGTAGCTAGAGCGGTCATCGGCCAAATTCCC-AACAGCAGTTGGGGT 

TACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGCGGTCGTCGGCCAACCTCCCCAACAGCACTTAGGTT 
GTCCTGTTTAGAGGGGGG 



TTCCTGTTGAGATGGGGG 
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ACCTTGCAAGATCAACTTA-ACTCCCTAGCAGCAGT-AGTCCTTCAAAATCGAAGAGCTTTAGACTTGCT 

ACTTTACAATCCCAAATAAGACTCTTTGGCAGCAGTGACTC - TCCAAAACCGCTGAGGCCTAGATCTCCT 
AACCGCTGAAAGAGGGGGAACCTGTTTATTTTTAGGGGAAGAATGCTGTTATTATGTTAATCAATCCGGA 

CACTGCTGAAAAAGGAGGACTCTGCACCTTCTTAGGGGAAGAGTGTTGTTTTTACACTAACCAGTCAGGG 
ATCGTCACTGAGAAAGTTAAAGAAATTCGAGATCGAATA- - CAACGTAGAGCAGAGGAGCTTCGAAACAC 

ATAG - CA- TGAGAT - GCCACCCAGCGTTTACAG - GAAAAGGCTTCTGAAATCAGACGCCTTTC - AAATTC 
TGGACCCTGGGGCCTCCTCAGCCAATGGATGCCCTGGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATA 

TTATACCAA CCTCTGGAGT TGGGCAACATGGCTTCTCCCCTTTCTAGGTCCCGTGGCAGCCATC 

ATATTGCTACTCCTCTTTGGACCCTGTATCTTTAACCTCCTTGTTAACTTTGTCTCTTCCAGAATCGAAG 

TTGCTGTTACTCGCCTTTGGGCCCCGTATTTTTAACCTTCTTGTCAAATTTGTTTGGTCTAGAATCGAGG 
C--T G- TAAA - A CT - ACAAATGGAGCCCAAGATGCAGTCCAAG - ACTAAGATCTACCGCAGAC 

CCATC AAGCTACAGATGGTCTT ACAAATCGAACCC CAAATG - AGTTCAACTAACAACTTCTACCGAGGAC 
CCCTGGACCGGCCTGCTAGCCCACGATCTGATGTTAATGACATCAAAG- GCACCCCTCCTGA- GGAAATC 

CCCTGGACTGACCAGCTGGC - - ACT - TCCCCTG GCC - T-AGAGAGTTCCCCTC - TGAAGGACA - C 

T - CAGCTGCACAACCTCTACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGCGGTCGTCGGCCAACCTCC 



TACAACTGCAAAGCCCCTTCTTCGCCCCTATCCAGCAGGAAGTAGCTAGAGCAGTCATCGGCCAAATTCC 
CCAACAGCACTTAGGTTTTCCTGTTGAGATGGGGG 

C - AAC AGC AGTTGGGGTGTCCTGTTGAT - TGAGGG 
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GTCTGCCTACCCTCAGGAATATTTTTTGTCTGTGGTACCTCAGCCTATCGTTGTTTGA- -A-TGGCTCTT 

GTCTACCTAGCCA-AGGCATATTCTTCTTATGTGGAACATCAACCTATATCTGCCTCCCCACTAACTGGA 
CAGAATCTATGTGC - TTCCTCTCATTCTTAGTGCCCCCTATGACCATCTACACTGAACA- - AGATTTATA 

CAGGCACC - TGAACCTTAGTCT - - TTCTAAGTCCCAAC - ATTAACATTGCCCCAGGAAATCAGACCC - TA 
CAGTTA- - TGTCATATCTAA- GCCCCGCAACAAAAGAGT - ACCCAT - TC - T - TCCTTTTGTTATAGGAGC 

TTGGTACCTGTCAAAGCTAAAGTCCCGTCAGTGCAGAGCC ATACAACTAATATCCCTAT - TTATAGGGTT 
AGGAGTG - CTAGGTGC - ACTAGGTACTGGCATTGGCGGTATCACAACCTCTACTCAGTTCTACTACAA- A 

AGGAATGGCTAC - TGCTAC - AGGAACTGGAATAGCCGGTTTATCTACTTC - ATT - A- TCCTACTACCATA 
CTATCTCAA- GAACTAAATGGGGACATGGAACGGGT CGCCGAC - TCCCTGGTCACCTTGCAAGATCAACT 

CACTCTCAAAGAATTTCTCAGACAGTTTGCAAGAAATAATGAAATCTATTCTTACTTTACAATCCCAA-T 
TA-ACTCCCTAGCAGCAGT-AGTCCTTCAAAATCGAAGAGCTTTAGACTTGCTAACCGCTGAAAGAGGGG 

TAGACTCTTTGGCAGCAATGACTC - TCCAAAACCGCCGAGGCCCACACCTCCTCACTGCTGAGAAAGGAG 
GAACCTGTTTATTTTTAGGGGAAGAATGCTGTTATTATGTTAATCAATCCGGAATCGTCACTGAGAAAGT 

GACTCTGC ACCTTCTTAGGGGAAGAGTGTTGTTTTTACACTAACCAGTCAGGG ATAGT - AC - GAGAT - GC 
TAAA - GAAATTCGAGATCGAATA- CAACGTAGAGCAGAGGA- GC - TTCGAAACACTGGACCCTGGGGCCT 

CACCTGGCATTT - ACAGGAAAGGGCTTCTGATATCAGACAATGCCTTTCAAACTCTTATACCAA CCT 

CCTCAGCCAATGGATGCCCTGGATTCTCCCCTTCTTAGGACCTCTAGCAGCTATAATATTGCTACTCCTC 

CTGGAGT TGGGCAACATGGCTTCTTCCATTTCTAGGTCCCATGGCAGCCATCTTGCTGTTACTCACC 

TTTGGACCCTGTATCTTTAACCTCCTTGTTAACTTTGTCTCTTCCAGAATCGAAGC - - T G - TAAA- A 

TTTGGGCCCTGTATTTTTAAGCTTCTTGTCAAATTTGTTTCCTCTAGGATCGAAGCCATCAAGCTACAGA 
CT-ACAAATGGAGCCCAAGATGCAGTCCAAG-ACTAAGATCTACCGCAGACCCCTGGACCGGCCTG 

TGGTCTTACAAATGGAACCCCAAATG - AGTTCAACTAACAACTTCTACCAAGGACCCCTGG AACGATCCA 
CTAGCCCACGATCTGATGTTAATGACATCAAAGGCACCCCTCCTGAGGAAATCT - CAGCTGCACAACCTC 



CTGGC - - ACT - TCC -AC-T-A- - GCC - T - AGAGATTCCCCTCTGGAAGACA- CTACAACTGCAGGGCCCC 
TACTACGCCCCAATTCAGCAGGAAGCAGTTAGAGCGGTCGTCGGCCAACCTCCCCAACAGCACTTAGGTT 



TTCTTTGCCCCTATCCAGCAGGAAGTAGCTAGAGCGGTCATCGGCCAAATTCCC-AACAGCAGTTGGGGT 
TTCCTGTTGAGATGGGGG 



GTCCTGTTTAGAGGGGGG 
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